llama.cpp b9172 adds KleidiAI, updates all platform builds
110K-star project now runs on ARM64, Vulkan, CUDA 13, and more...
llama.cpp, the popular open-source C++ library for running large language models locally (110K stars, 18.2K forks), has released version b9172. This release focuses on expanding platform support and improving build distribution. The release includes pre-compiled binaries for 18 different target configurations covering macOS (Apple Silicon with and without KleidiAI, Intel x64), Windows (CPU, CUDA 12.4, CUDA 13.1, Vulkan, SYCL, HIP), Linux (CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Android (arm64 CPU), and iOS (XCFramework). Also included are builds for openEuler with ACL Graph support.
Beyond the expanded build matrix, the release contains one functional change: a fix in the WebUI to use lowercase hashes for HuggingFace checksum verification (issue #23107). This ensures compatibility with HuggingFace's expected format. While no major new features are introduced, the comprehensive set of pre-built binaries lowers the barrier for developers and users on niche platforms, allowing them to run local LLMs without compiling from source. The inclusion of KleidiAI (Arm's KleidiAI library) on Apple Silicon provides potential performance improvements for machine learning workloads on iOS/macOS.
- Version b9172 adds pre-built binaries for 18 platform combinations including KleidiAI-enabled Apple Silicon, CUDA 13, ROCm 7.2, and openEuler.
- New WebUI fix ensures lowercase hash comparison for HuggingFace checksums, resolving compatibility issues.
- llama.cpp remains the most-starred LLM inference engine on GitHub with 110K stars and 18.2K forks.
Why It Matters
llama.cpp makes powerful local LLMs accessible across nearly every platform, democratizing AI inference for developers.