Developer Tools

b8852

llama.cpp Releases April 20, 2026

⚡The latest update to the popular open-source inference engine expands hardware compatibility to 28 distinct platforms.

Deep Dive

The ggml-org team behind the massively popular llama.cpp project has released a significant new version, b8852. This update to the high-performance, open-source C++ inference engine for Meta's Llama models focuses on dramatically expanding hardware compatibility and refining server functionality. The key server change renames the `--clear-idle` command-line argument to the more descriptive `--cache-idle-slots`, improving clarity for developers managing memory allocation in multi-user deployments.

The major news is the explosion of supported platforms. The release now provides pre-built binaries for 28 distinct hardware/OS combinations. This includes comprehensive coverage for macOS (Apple Silicon and Intel), Linux (x64, arm64, s390x CPUs, plus new Vulkan, ROCm 7.2, and OpenVINO backends), Windows (x64/arm64 CPU, CUDA 12/13, Vulkan, SYCL, HIP), Android, and even specialized builds for Huawei's openEuler OS with Ascend 310P/910B AI processors. This move turns llama.cpp from a tool for specific setups into a universal runtime for efficient LLM inference across the entire hardware spectrum, from mobile devices to data center GPUs and specialized AI accelerators.

Key Points

Version b8852 renames the server argument `--clear-idle` to `--cache-idle-slots` for better clarity in resource management.
Adds pre-built binaries for 28 platforms, including new Vulkan, ROCm 7.2, OpenVINO, SYCL, and expanded CUDA support.
Extends reach to niche hardware like openEuler with Huawei Ascend chips, making efficient LLM inference nearly universal.

Why It Matters

Dramatically lowers the barrier to running efficient, local LLMs by supporting virtually any hardware a developer or company might have.

Read Original Article

b8852

Why It Matters

Stay Ahead in AI