Developer Tools

b8570

llama.cpp Releases March 29, 2026

⚡The latest commit to the popular 99.8k-star project enables a key compatibility feature for open-source reasoning models.

Deep Dive

The open-source powerhouse behind efficient local AI inference, llama.cpp, has pushed a significant new commit. Designated b8570, this update from the ggml-org team introduces a crucial piece of compatibility: support for `reasoning_format = none` within its GPT-OSS implementation. This parameter is essential for correctly interfacing with a growing class of open-source language models that are built with specific "reasoning" or "chain-of-thought" architectures. By adding this flag, llama.cpp ensures these advanced models can be loaded and run without format errors, unlocking their full potential for developers who rely on the project's unparalleled performance and hardware support.

This update underscores llama.cpp's role as the universal runtime for the open-source AI ecosystem. With over 99.8k stars on GitHub, the project provides pre-built binaries for a staggering array of platforms, from macOS Apple Silicon and Windows with CUDA 12.4 to Linux with ROCm 7.2 and specialized builds for openEuler. The addition of `reasoning_format=none` is a targeted but important fix that keeps the engine compatible with the cutting edge of model development. It allows researchers and application builders to deploy the latest reasoning-capable models, like some variants of the Llama 3 series or other GPT-OSS-aligned architectures, locally on everything from laptops to servers, leveraging CPU or accelerated backends like Vulkan and HIP.

Key Points

Commit b8570 adds `reasoning_format = none` parameter support to GPT-OSS in llama.cpp, fixing compatibility for certain reasoning models.
The update is part of the massively popular 99.8k-star project that provides cross-platform binaries for macOS, Windows, Linux, and iOS.
Enables seamless local deployment of advanced open-source models that use specific reasoning architectures on diverse hardware backends (CPU, CUDA, Vulkan, ROCm).

Why It Matters

This keeps the essential llama.cpp engine compatible with the latest open-source reasoning models, ensuring developers can run state-of-the-art AI locally.

Read Original Article

b8570

Why It Matters

Stay Ahead in AI