Developer Tools

llama.cpp v9400 fixes Gemma 4 projector normalization

Gemma 4 support gets a critical fix in latest llama.cpp release.

Deep Dive

The open-source llama.cpp project released version b9400, a patch release addressing a critical bug in the Gemma 4 model support. Specifically, the fix resolves an issue with the Gemma 4 projector's pre-normalization layer (mtmd: fix gemma 4 projector pre_norm). This ensures that users running Google's latest Gemma 4 family of models (including the 2B and 9B variants) on llama.cpp will get correct outputs.

The new release is available as pre-built binaries for a wide range of platforms: macOS (Apple Silicon arm64, with and without KleidiAI acceleration; Intel x64; iOS XCFramework), Linux (x64 and arm64 CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL), Windows (x64/arm64 CPU, CUDA 12/13, Vulkan, HIP), and Android arm64. Users can also build from source. This release demonstrates the community's rapid iteration on supporting new models, ensuring llama.cpp remains a go-to inference engine for local LLM deployment.

Key Points
  • Fixes Gemma 4 projector pre-normalization for accurate model inference
  • Pre-built binaries for 15+ platforms including Apple Silicon, Linux, Windows, Android
  • Part of ongoing rapid iteration on llama.cpp with 114k GitHub stars

Why It Matters

Ensures developers and enthusiasts can run Google's latest Gemma 4 models correctly on local hardware.