llama.cpp b9311 adds KleidiAI, updates cpp-httplib for local LLM
New version boosts Apple Silicon performance with KleidiAI and expands GPU support.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
llama.cpp's b9311 release brings significant platform improvements for running large language models locally. The standout addition is KleidiAI support on macOS Apple Silicon (arm64), which optimizes matrix operations for Apple's Neural Engine and GPU, resulting in faster inference on M-series chips. The update also bumps cpp-httplib to v0.45.1, improving HTTP handling for server modes.
The release broadens hardware compatibility with new builds: Windows now supports both CUDA 12 and CUDA 13, plus Vulkan, SYCL, and HIP. Linux users get Vulkan, ROCm 7.2, OpenVINO, and SYCL (FP16/FP32). The project also releases openEuler builds for Huawei's Ascend processors (310p and 910b). iOS XCFramework is included for mobile deployment. Each binary is signed with GitHub's verified signature (GPG key B5690EEEBB952194), ensuring trustworthiness.
- KleidiAI enabled on macOS Apple Silicon (arm64) for optimized AI inference
- Updated cpp-httplib to 0.45.1 for improved HTTP server functionality
- New builds: Windows CUDA 13, ROCm 7.2, OpenVINO, SYCL, openEuler with Ascend support
Why It Matters
Broader hardware support and KleidiAI acceleration make local LLM deployment faster and more accessible across devices.