llama.cpp v9400 fixes Gemma 4 projector normalization
Gemma 4 support gets a critical fix in latest llama.cpp release.
The open-source llama.cpp project released version b9400, a patch release addressing a critical bug in the Gemma 4 model support. Specifically, the fix resolves an issue with the Gemma 4 projector's pre-normalization layer (mtmd: fix gemma 4 projector pre_norm). This ensures that users running Google's latest Gemma 4 family of models (including the 2B and 9B variants) on llama.cpp will get correct outputs.
The new release is available as pre-built binaries for a wide range of platforms: macOS (Apple Silicon arm64, with and without KleidiAI acceleration; Intel x64; iOS XCFramework), Linux (x64 and arm64 CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL), Windows (x64/arm64 CPU, CUDA 12/13, Vulkan, HIP), and Android arm64. Users can also build from source. This release demonstrates the community's rapid iteration on supporting new models, ensuring llama.cpp remains a go-to inference engine for local LLM deployment.
- Fixes Gemma 4 projector pre-normalization for accurate model inference
- Pre-built binaries for 15+ platforms including Apple Silicon, Linux, Windows, Android
- Part of ongoing rapid iteration on llama.cpp with 114k GitHub stars
Why It Matters
Ensures developers and enthusiasts can run Google's latest Gemma 4 models correctly on local hardware.