b8717
The latest update removes a specific token for Gemma 4 models and adds new builds for Vulkan, ROCm, and OpenVINO.
The open-source community behind llama.cpp, the C++ inference engine powering countless local AI applications, has released version b8717. This update primarily addresses compatibility with Google's recently released Gemma 4 family of models by removing the end-of-generation (EOG) token from the vocabulary, a crucial fix for proper text generation. Beyond this model-specific patch, the release significantly broadens hardware support, introducing new build targets that let developers deploy models on a wider array of systems.
For Linux users, the release adds pre-built binaries for Ubuntu with Vulkan API support (both x64 and arm64) and, notably, for ROCm 7.2, AMD's open software platform for GPU computing. A new OpenVINO build option also emerges, targeting Intel hardware acceleration. Windows users gain CUDA 12.4 and 13.1 DLL variants, while macOS/iOS builds now feature a 'KleidiAI enabled' version for Apple Silicon, promising optimized performance. The commit, signed with GitHub's verified signature, underscores the project's maintained focus on security and trusted updates for its massive 103k-star repository.
- Adds critical vocabulary fix for Google's Gemma 4 models by removing the EOG token
- Expands GPU backend support with new builds for Vulkan, ROCm 7.2, and OpenVINO
- Introduces KleidiAI-accelerated binaries for macOS/iOS on Apple Silicon for better performance
Why It Matters
This update lowers the barrier to running the latest open models efficiently on diverse hardware, from AMD GPUs to Intel chips and Apple Silicon.