llama.cpp b9393 fixes Gemma 4 audio with RMS norm epsilon tweak
Open-source LLM runner patches audio model parameter to improve accuracy.
The llama.cpp project, the go-to C++ implementation for running large language models locally, has shipped a maintenance release (tag b9393) with a targeted fix for Gemma 4 audio processing. The change, titled "mtmd: fix gemma 4 audio rms norm eps," addresses a numerical precision issue in the RMS normalization epsilon used by the Gemma 4 audio model. While seemingly minor, such epsilon values can significantly affect the stability and quality of audio token generation in multi-modal models. The fix was co-authored by Sigbjørn Skjæret and merged into the main branch.
The release includes pre-built binaries for all major platforms: macOS (Apple Silicon and Intel), Linux (x64 and arm64 with various backends including Vulkan, ROCm, OpenVINO), and Windows (CPU, CUDA, Vulkan, HIP). Android arm64 is also supported. This ensures that users running Gemma 4 for audio tasks on diverse hardware can benefit from the corrected behavior. As llama.cpp continues to be a cornerstone for edge AI deployment, even small patches like this uphold the reliability required by developers and researchers.
- Fixes a Gemma 4 audio RMS norm epsilon issue in the MTMD module, improving audio model stability.
- Release b9393 includes builds for macOS, Linux, Windows, Android, and GPU backends like CUDA and Vulkan.
- Authored by Sigbjørn Skjæret, the patch addresses a numerical precision bug in multi-modal audio processing.
Why It Matters
Ensures accurate audio generation in Gemma 4 models running locally via llama.cpp.