Ollama v0.30.5 fixes Gemma 4 12B crash on x86/CUDA
Ollama bumps llama.cpp to b9509, squashing a divide-by-zero bug in Gemma 4 12B.
Ollama has pushed out release candidate v0.30.5-rc0, a focused update that bumps the underlying llama.cpp backend to commit b9509. The primary change is an upstream fix for the Gemma 4 12B multimodal projector, which was crashing with a divide-by-zero error when n_head was set to zero. This crash specifically affected x86 processors and CUDA GPUs on both Linux and Windows systems, preventing users from running or fine-tuning the Gemma 4 12B model locally. The fix resolves five open GitHub issues (#16479, #16489, #16491, #16492, #16495) that had been reported since the model was added.
For developers and AI enthusiasts running Ollama for local inference or experimentation, this update removes a significant roadblock. The Gemma 4 12B model is known for its strong multimodal capabilities (text + images), but the crash made it unusable on many common hardware configurations. With this patch, users can now deploy Gemma 4 12B reliably on x86/CUDA setups. The release is tagged as a release candidate, so it's likely to undergo quick validation before a full stable release. Ollama's rapid response (within days of the crash reports) underscores its commitment to keeping local AI workflows smooth.
- Ollama v0.30.5-rc0 updates llama.cpp to b9509, incorporating upstream Gemma 4 12B fixes.
- Fixed a divide-by-zero crash (n_head=0) that affected x86, CUDA, Linux, and Windows systems.
- Resolves five reported issues: #16479, #16489, #16491, #16492, and #16495.
Why It Matters
Local LLM users can now run Gemma 4 12B without crashes on x86/CUDA, restoring key multimodal capabilities.