A reasoning budget bug fix was merged into llama.cpp (PR #21697), correcting a core feature for controlled internal reasoning?

A reasoning budget bug fix was merged into llama.cpp (PR #21697), correcting a core feature for controlled internal reasoning.

Google released new chat templates for all Gemma 4 IT variants (31B, 27B, E4B, E2B) on Hugging Face to fix broken tool/function calling?

Google released new chat templates for all Gemma 4 IT variants (31B, 27B, E4B, E2B) on Hugging Face to fix broken tool/function calling.

Users must apply new `.jinja` templates manually or redownload updated GGUF files to gain the fixes, as shown in the detailed server config example?

Users must apply new `.jinja` templates manually or redownload updated GGUF files to gain the fixes, as shown in the detailed server config example.

Open Source

Google's Gemma 4 gets critical fixes for reasoning budgets and tool calling

r/LocalLLaMA April 11, 2026

⚡Two major patches released in 24 hours fix core functionality for the open-source AI model.

Deep Dive

Google's open-source Gemma 4 models have received two significant patches within 24 hours, addressing core functionality issues that were impacting developer adoption. The first fix, merged into the popular llama.cpp inference engine via GitHub pull request #21697, resolves a bug with the model's 'reasoning budget' parameter. This feature is crucial for controlling the length of internal 'chain-of-thought' reasoning, a key differentiator for advanced models. Simultaneously, Google released updated chat templates (Jinja files) on Hugging Face for the Gemma 4 31B, 27B, E4B, and E2B instruction-tuned variants, specifically fixing incorrect formatting that broke tool and function calling capabilities.

For users running these models locally, the update path requires action. Developers must either manually apply the new `.jinja` chat template files using a command-line argument like `--chat-template-file`, or they must redownload the complete GGUF model files that have been rebuilt in the last day to include the corrected template. The provided configuration example shows how to implement the fix for the 26B model, enabling features like thinking with a 4096-token reasoning budget. These rapid fixes highlight the collaborative nature of open-source AI, where community feedback on platforms like GitHub and Hugging Face leads to swift improvements in model stability and usability for complex tasks like agentic AI.

Key Points

A reasoning budget bug fix was merged into llama.cpp (PR #21697), correcting a core feature for controlled internal reasoning.
Google released new chat templates for all Gemma 4 IT variants (31B, 27B, E4B, E2B) on Hugging Face to fix broken tool/function calling.
Users must apply new `.jinja` templates manually or redownload updated GGUF files to gain the fixes, as shown in the detailed server config example.

Why It Matters

These fixes are essential for developers building reliable AI agents and applications with Gemma 4's advanced reasoning and tool-use features.

Read Original Article

Google's Gemma 4 gets critical fixes for reasoning budgets and tool calling

Why It Matters

Related Articles

🚀 Stay Ahead in AI