Developer Tools

llama.cpp b9413 improves hash mixing and CUDA JIT dispatch

MurmurHash3 mixer and smarter PTX version checking for better performance

Deep Dive

The b9413 release of llama.cpp addresses a subtle CUDA dispatch bug when using JIT compilation. Previously, the dispatcher relied solely on the __CUDA_ARCH_LIST__ macro, which does not differentiate between architecture variants like sm_90, sm_90a, or sm_90f. This could cause forward-JIT code to incorrectly use PDL (Parallel Data Layout) kernels on unsupported architectures. The fix checks cudaFuncAttributes::ptxVersion at runtime to ensure the incoming kernel's PTX version is compatible, preventing silent errors.

The second major improvement is the implementation of the MurmurHash3 mixer for hash distribution inside llama.cpp. The magic constants were taken from Boost's container_hash library, providing a more robust and well-tested hash mixing function. This change enhances the quality of hash distribution, which can improve performance in hash-based operations such as tokenization and model weight lookups. The release also includes code-formatting consistency fixes and addresses review comments from the community.

Key Points
  • Fixes CUDA JIT dispatch by checking PTX version at runtime instead of relying on __CUDA_ARCH_LIST__
  • Implements MurmurHash3 mixer using Boost's magic constants for better hash distribution
  • Affects all CUDA support levels including sm_89, sm_90a, and forward-JIT mode

Why It Matters

Improves local LLM reliability and performance across NVIDIA GPUs, especially for JIT-compiled models.