b8113
The latest commit fixes critical XML tool call detection and adds thinking support for new reasoning models.
Deep Dive
The ggml-org team released llama.cpp version b8113, a key update to the popular open-source inference engine. It fixes Step-3.5-Flash model format detection, preventing crashes by properly routing its XML-style tool calls to the Nemotron v3 PEG parser. The update also adds 'thinking_forced_open' support, allowing models to separate reasoning content from final answers in API responses, which is essential for advanced agentic workflows.
Why It Matters
Developers can now reliably run newer reasoning models like Step-3.5-Flash locally for complex, multi-step AI agent tasks without errors.