Removes restrict keyword from PDL kernel headers to fix race conditions?

Removes restrict keyword from PDL kernel headers to fix race conditions

Adds architecture-specific preprocessor directives to retain restrict performance on older hardware?

Adds architecture-specific preprocessor directives to retain restrict performance on older hardware

Includes builds for macOS, Linux (CPU, Vulkan, ROCm, OpenVINO), Windows (CUDA, Vulkan, HIP), and Android?

Includes builds for macOS, Linux (CPU, Vulkan, ROCm, OpenVINO), Windows (CUDA, Vulkan, HIP), and Android

Developer Tools

llama.cpp b9491 fixes PDL race conditions with architecture-aware restrict

llama.cpp Releases June 03, 2026

⚡Eliminates concurrency bugs while preserving performance on older GPU architectures.

Deep Dive

The latest release of llama.cpp, b9491, addresses a critical race condition that occurred when PDL (Parallel Deep Learning) kernels were used. The root cause was the use of the `restrict` keyword in kernel headers, which is incompatible with PDL's concurrency model. To fix this, the developers removed `restrict` from the PDL kernel headers entirely. However, to avoid a performance regression on older CPU/GPU architectures where `restrict` provides significant optimization, they added architecture-specific preprocessor directives. These directives conditionally reintroduce `restrict` in the kernel body only on architectures known to benefit from it, such as older x86 and ARM cores. Additionally, a new macro simplifies the use of `restrict` across the codebase, making future maintenance easier. The fix was contributed by Oliver Simons from NVIDIA, with subsequent updates adding support for Hopper GPUs.

This release also includes build artifacts for a wide range of platforms: macOS (Apple Silicon and Intel, with a KleidiAI option), Linux (x86, ARM64, s390x with CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL), Windows (x64 and ARM64 with CPU, CUDA 12/13, Vulkan, HIP), and Android (ARM64). Some builds like macOS KleidiAI, Linux SYCL FP32, and openEuler are disabled in this release. Users upgrading from previous versions should notice improved stability when using PDL-based inference without sacrificing speed on older hardware.

Key Points

Removes restrict keyword from PDL kernel headers to fix race conditions
Adds architecture-specific preprocessor directives to retain restrict performance on older hardware
Includes builds for macOS, Linux (CPU, Vulkan, ROCm, OpenVINO), Windows (CUDA, Vulkan, HIP), and Android

Why It Matters

Ensures stable LLM inference with PDL across diverse hardware, crucial for self-hosted deployments.

Read Original Article

llama.cpp b9491 fixes PDL race conditions with architecture-aware restrict

Why It Matters

Related Articles

🚀 Stay Ahead in AI