Developer Tools

llama.cpp b9311 adds KleidiAI, updates cpp-httplib for local LLM

New version boosts Apple Silicon performance with KleidiAI and expands GPU support.

Deep Dive

llama.cpp's b9311 release brings significant platform improvements for running large language models locally. The standout addition is KleidiAI support on macOS Apple Silicon (arm64), which optimizes matrix operations for Apple's Neural Engine and GPU, resulting in faster inference on M-series chips. The update also bumps cpp-httplib to v0.45.1, improving HTTP handling for server modes.

The release broadens hardware compatibility with new builds: Windows now supports both CUDA 12 and CUDA 13, plus Vulkan, SYCL, and HIP. Linux users get Vulkan, ROCm 7.2, OpenVINO, and SYCL (FP16/FP32). The project also releases openEuler builds for Huawei's Ascend processors (310p and 910b). iOS XCFramework is included for mobile deployment. Each binary is signed with GitHub's verified signature (GPG key B5690EEEBB952194), ensuring trustworthiness.

Key Points
  • KleidiAI enabled on macOS Apple Silicon (arm64) for optimized AI inference
  • Updated cpp-httplib to 0.45.1 for improved HTTP server functionality
  • New builds: Windows CUDA 13, ROCm 7.2, OpenVINO, SYCL, openEuler with Ascend support

Why It Matters

Broader hardware support and KleidiAI acceleration make local LLM deployment faster and more accessible across devices.