Fixes CUDA JIT dispatch by checking PTX version at runtime instead of relying on __CUDA_ARCH_LIST__?

Fixes CUDA JIT dispatch by checking PTX version at runtime instead of relying on __CUDA_ARCH_LIST__

Implements MurmurHash3 mixer using Boost's magic constants for better hash distribution?

Implements MurmurHash3 mixer using Boost's magic constants for better hash distribution

Affects all CUDA support levels including sm_89, sm_90a, and forward-JIT mode?

Affects all CUDA support levels including sm_89, sm_90a, and forward-JIT mode

Developer Tools

llama.cpp b9413 improves hash mixing and CUDA JIT dispatch

llama.cpp Releases May 30, 2026

⚡MurmurHash3 mixer and smarter PTX version checking for better performance

Deep Dive

The b9413 release of llama.cpp addresses a subtle CUDA dispatch bug when using JIT compilation. Previously, the dispatcher relied solely on the __CUDA_ARCH_LIST__ macro, which does not differentiate between architecture variants like sm_90, sm_90a, or sm_90f. This could cause forward-JIT code to incorrectly use PDL (Parallel Data Layout) kernels on unsupported architectures. The fix checks cudaFuncAttributes::ptxVersion at runtime to ensure the incoming kernel's PTX version is compatible, preventing silent errors.

The second major improvement is the implementation of the MurmurHash3 mixer for hash distribution inside llama.cpp. The magic constants were taken from Boost's container_hash library, providing a more robust and well-tested hash mixing function. This change enhances the quality of hash distribution, which can improve performance in hash-based operations such as tokenization and model weight lookups. The release also includes code-formatting consistency fixes and addresses review comments from the community.

Key Points

Fixes CUDA JIT dispatch by checking PTX version at runtime instead of relying on __CUDA_ARCH_LIST__
Implements MurmurHash3 mixer using Boost's magic constants for better hash distribution
Affects all CUDA support levels including sm_89, sm_90a, and forward-JIT mode

Why It Matters

Improves local LLM reliability and performance across NVIDIA GPUs, especially for JIT-compiled models.

Read Original Article

llama.cpp b9413 improves hash mixing and CUDA JIT dispatch

Why It Matters

Related Articles

🚀 Stay Ahead in AI