Developer Tools

b8717

llama.cpp Releases April 09, 2026

⚡The latest update removes a specific token for Gemma 4 models and adds new builds for Vulkan, ROCm, and OpenVINO.

Deep Dive

The open-source community behind llama.cpp, the C++ inference engine powering countless local AI applications, has released version b8717. This update primarily addresses compatibility with Google's recently released Gemma 4 family of models by removing the end-of-generation (EOG) token from the vocabulary, a crucial fix for proper text generation. Beyond this model-specific patch, the release significantly broadens hardware support, introducing new build targets that let developers deploy models on a wider array of systems.

For Linux users, the release adds pre-built binaries for Ubuntu with Vulkan API support (both x64 and arm64) and, notably, for ROCm 7.2, AMD's open software platform for GPU computing. A new OpenVINO build option also emerges, targeting Intel hardware acceleration. Windows users gain CUDA 12.4 and 13.1 DLL variants, while macOS/iOS builds now feature a 'KleidiAI enabled' version for Apple Silicon, promising optimized performance. The commit, signed with GitHub's verified signature, underscores the project's maintained focus on security and trusted updates for its massive 103k-star repository.

Key Points

Adds critical vocabulary fix for Google's Gemma 4 models by removing the EOG token
Expands GPU backend support with new builds for Vulkan, ROCm 7.2, and OpenVINO
Introduces KleidiAI-accelerated binaries for macOS/iOS on Apple Silicon for better performance

Why It Matters

This update lowers the barrier to running the latest open models efficiently on diverse hardware, from AMD GPUs to Intel chips and Apple Silicon.

Read Original Article

b8717

Why It Matters

Stay Ahead in AI