b8909
New release supports Anthropic API conversion and expands to iOS, Android, and more...
The latest release of llama.cpp, version b8909, introduces a key server update: the `convert_anthropic_to_oai` function now also copies `chat_template_kwargs`. This change streamlines interoperability by enabling Anthropic API requests to be transparently converted into OpenAI-compatible format, making it easier for developers to use Anthropic clients with llama.cpp servers.
This release expands platform support significantly, with pre-built binaries for macOS Apple Silicon (both standard and KleidiAI-enabled), iOS XCFramework, Linux across multiple architectures (x64, arm64, s390x) and backends (CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Android arm64, Windows (x64 CPU, arm64 CPU, CUDA 12.4/13.1, Vulkan, SYCL, HIP), and openEuler (x86 and aarch64 with Ascend 310p/910b). The wide range of builds ensures that llama.cpp can run efficiently on diverse hardware, from consumer devices to enterprise servers.
- Server update: convert_anthropic_to_oai now copies chat_template_kwargs for better Anthropic-to-OpenAI API conversion
- New builds: macOS Apple Silicon (KleidiAI), iOS XCFramework, Android arm64, Windows arm64 CPU, openEuler (Ascend NPUs)
- Expanded GPU support: CUDA 12.4/13.1, ROCm 7.2, Vulkan, SYCL FP32/FP16, HIP, and OpenVINO
Why It Matters
Simplifies cross-platform AI inference and API compatibility, enabling broader deployment of local LLMs.