b8744
The latest commit fixes a critical bug that prevented Gemma 4's advanced reasoning capabilities from activating.
The llama.cpp project, the leading open-source C++ inference engine for running models like Llama and Gemma locally, has released a significant technical update (b8744). This commit specifically fixes a bug that prevented the "reasoning budget sampler" from activating when using Google's recently released Gemma 4 model. The issue (#21487) was that the necessary `thinking_start_tag` and `thinking_end_tag` parameters were missing from the model's initialization function, which meant the model couldn't engage its structured reasoning pathways.
Without this fix, users running Gemma 4 through llama.cpp were missing a key feature: the model's ability to allocate computational "budget" for internal reasoning steps before producing a final answer. The update also modifies the PEG grammar parser to correctly handle edge cases, like when the reasoning budget is set to zero. This ensures robust performance for advanced reasoning techniques like chain-of-thought, which are crucial for complex logic, math, and coding tasks. The fix is now available across all llama.cpp's extensive platform builds, from macOS Apple Silicon to Windows with CUDA.
- Fixes critical bug #21487 that blocked Gemma 4's reasoning budget sampler from activating.
- Adds required `thinking_start_tag` and `thinking_end_tag` parameters to common chat initialization for Gemma 4.
- Enables proper use of structured reasoning (chain-of-thought) for complex problem-solving on all supported hardware.
Why It Matters
Unlocks Gemma 4's full reasoning potential for developers, enabling more accurate and logical outputs for coding, math, and analysis tasks locally.