Developer Tools

b8744

llama.cpp Releases April 10, 2026

⚡The latest commit fixes a critical bug that prevented Gemma 4's advanced reasoning capabilities from activating.

Deep Dive

The llama.cpp project, the leading open-source C++ inference engine for running models like Llama and Gemma locally, has released a significant technical update (b8744). This commit specifically fixes a bug that prevented the "reasoning budget sampler" from activating when using Google's recently released Gemma 4 model. The issue (#21487) was that the necessary `thinking_start_tag` and `thinking_end_tag` parameters were missing from the model's initialization function, which meant the model couldn't engage its structured reasoning pathways.

Without this fix, users running Gemma 4 through llama.cpp were missing a key feature: the model's ability to allocate computational "budget" for internal reasoning steps before producing a final answer. The update also modifies the PEG grammar parser to correctly handle edge cases, like when the reasoning budget is set to zero. This ensures robust performance for advanced reasoning techniques like chain-of-thought, which are crucial for complex logic, math, and coding tasks. The fix is now available across all llama.cpp's extensive platform builds, from macOS Apple Silicon to Windows with CUDA.

Key Points

Fixes critical bug #21487 that blocked Gemma 4's reasoning budget sampler from activating.
Adds required `thinking_start_tag` and `thinking_end_tag` parameters to common chat initialization for Gemma 4.
Enables proper use of structured reasoning (chain-of-thought) for complex problem-solving on all supported hardware.

Why It Matters

Unlocks Gemma 4's full reasoning potential for developers, enabling more accurate and logical outputs for coding, math, and analysis tasks locally.

Read Original Article

b8744

Why It Matters

Stay Ahead in AI