Fixed shared memory size calculation for MMQ shaders in Vulkan backend (issue #22693)?

Fixed shared memory size calculation for MMQ shaders in Vulkan backend (issue #22693)

Added prebuilt binaries for macOS Apple Silicon with KleidiAI, Linux s390x, Android arm64, Windows arm64?

Added prebuilt binaries for macOS Apple Silicon with KleidiAI, Linux s390x, Android arm64, Windows arm64

Supports 10+ platform/backend combinations including CUDA 12/13, ROCm 7.2, OpenVINO, SYCL, HIP?

Supports 10+ platform/backend combinations including CUDA 12/13, ROCm 7.2, OpenVINO, SYCL, HIP

Developer Tools

llama.cpp build b9118 fixes Vulkan shared memory, expands platform support

llama.cpp Releases May 12, 2026

⚡110k-star open-source LLM project updates GPU shaders and adds ARM64 builds...

Deep Dive

The open-source llama.cpp project, which has amassed over 110,000 GitHub stars and 18,100 forks, released build b9118 on May 12. This commit — signed with GitHub's verified GPG key — addresses a critical Vulkan backend bug by properly checking shared memory size for matrix-multiply-quantized (MMQ) shaders (issue #22693). This fix ensures more stable GPU inference on Vulkan-capable devices, particularly for large language models.

The release expands platform coverage significantly. Precompiled binaries are now available for macOS Apple Silicon (both standard and with KleidiAI acceleration), macOS Intel, iOS as an XCFramework, Linux on x64/arm64/s390x with various backends (Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Windows x64/arm64 with CUDA 12/13 and Vulkan/SYCL/HIP, Android arm64, and openEuler (x86 and aarch64 with ACL Graph). This breadth makes llama.cpp one of the most cross-platform LLM inference engines, enabling developers and enthusiasts to run models locally on everything from Raspberry Pi to high-end desktop GPUs.

Key Points

Fixed shared memory size calculation for MMQ shaders in Vulkan backend (issue #22693)
Added prebuilt binaries for macOS Apple Silicon with KleidiAI, Linux s390x, Android arm64, Windows arm64
Supports 10+ platform/backend combinations including CUDA 12/13, ROCm 7.2, OpenVINO, SYCL, HIP

Why It Matters

Broadens local LLM inference to more hardware, fixing a key GPU stability issue for Vulkan users.

Read Original Article

llama.cpp build b9118 fixes Vulkan shared memory, expands platform support

Why It Matters

Related Articles

🚀 Stay Ahead in AI