Developer Tools

b8909

llama.cpp Releases April 24, 2026

⚡New release supports Anthropic API conversion and expands to iOS, Android, and more...

Deep Dive

The latest release of llama.cpp, version b8909, introduces a key server update: the `convert_anthropic_to_oai` function now also copies `chat_template_kwargs`. This change streamlines interoperability by enabling Anthropic API requests to be transparently converted into OpenAI-compatible format, making it easier for developers to use Anthropic clients with llama.cpp servers.

This release expands platform support significantly, with pre-built binaries for macOS Apple Silicon (both standard and KleidiAI-enabled), iOS XCFramework, Linux across multiple architectures (x64, arm64, s390x) and backends (CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Android arm64, Windows (x64 CPU, arm64 CPU, CUDA 12.4/13.1, Vulkan, SYCL, HIP), and openEuler (x86 and aarch64 with Ascend 310p/910b). The wide range of builds ensures that llama.cpp can run efficiently on diverse hardware, from consumer devices to enterprise servers.

Key Points

Server update: convert_anthropic_to_oai now copies chat_template_kwargs for better Anthropic-to-OpenAI API conversion
New builds: macOS Apple Silicon (KleidiAI), iOS XCFramework, Android arm64, Windows arm64 CPU, openEuler (Ascend NPUs)
Expanded GPU support: CUDA 12.4/13.1, ROCm 7.2, Vulkan, SYCL FP32/FP16, HIP, and OpenVINO

Why It Matters

Simplifies cross-platform AI inference and API compatibility, enabling broader deployment of local LLMs.

Read Original Article

b8909

Why It Matters

Stay Ahead in AI