Developer Tools

b8113

llama.cpp Releases February 20, 2026

⚡The latest commit fixes critical XML tool call detection and adds thinking support for new reasoning models.

Deep Dive

The ggml-org team released llama.cpp version b8113, a key update to the popular open-source inference engine. It fixes Step-3.5-Flash model format detection, preventing crashes by properly routing its XML-style tool calls to the Nemotron v3 PEG parser. The update also adds 'thinking_forced_open' support, allowing models to separate reasoning content from final answers in API responses, which is essential for advanced agentic workflows.

Why It Matters

Developers can now reliably run newer reasoning models like Step-3.5-Flash locally for complex, multi-step AI agent tasks without errors.

Read Original Article

b8113

Why It Matters

Stay Ahead in AI