b8109
The latest commit patches a critical MMQ shader push constant error affecting multi-dispatch operations on Vulkan.
The ggml-org team released commit b8109 for the open-source llama.cpp project. This update fixes a Vulkan shader push constant bug (issue #19732) that impacted MMQ (Matrix Multiplication Quantized) operations and multi-GPU dispatch. It also includes pre-built binaries for Windows (CUDA 12/13, Vulkan, SYCL, HIP), macOS (Apple Silicon & Intel), Linux, and iOS. Developers can now run quantized Llama models more reliably across a wider range of GPUs and operating systems.
Why It Matters
Fixes a core rendering bug, making local LLM inference more stable and expanding hardware compatibility for developers.