Version b9172 adds pre-built binaries for 18 platform combinations including KleidiAI-enabled Apple Silicon, CUDA 13, ROCm 7.2, and openEuler?

Version b9172 adds pre-built binaries for 18 platform combinations including KleidiAI-enabled Apple Silicon, CUDA 13, ROCm 7.2, and openEuler.

New WebUI fix ensures lowercase hash comparison for HuggingFace checksums, resolving compatibility issues?

New WebUI fix ensures lowercase hash comparison for HuggingFace checksums, resolving compatibility issues.

llama.cpp remains the most-starred LLM inference engine on GitHub with 110K stars and 18.2K forks?

llama.cpp remains the most-starred LLM inference engine on GitHub with 110K stars and 18.2K forks.

Developer Tools

llama.cpp b9172 adds KleidiAI, updates all platform builds

llama.cpp Releases May 16, 2026

⚡110K-star project now runs on ARM64, Vulkan, CUDA 13, and more...

Deep Dive

llama.cpp, the popular open-source C++ library for running large language models locally (110K stars, 18.2K forks), has released version b9172. This release focuses on expanding platform support and improving build distribution. The release includes pre-compiled binaries for 18 different target configurations covering macOS (Apple Silicon with and without KleidiAI, Intel x64), Windows (CPU, CUDA 12.4, CUDA 13.1, Vulkan, SYCL, HIP), Linux (CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Android (arm64 CPU), and iOS (XCFramework). Also included are builds for openEuler with ACL Graph support.

Beyond the expanded build matrix, the release contains one functional change: a fix in the WebUI to use lowercase hashes for HuggingFace checksum verification (issue #23107). This ensures compatibility with HuggingFace's expected format. While no major new features are introduced, the comprehensive set of pre-built binaries lowers the barrier for developers and users on niche platforms, allowing them to run local LLMs without compiling from source. The inclusion of KleidiAI (Arm's KleidiAI library) on Apple Silicon provides potential performance improvements for machine learning workloads on iOS/macOS.

Key Points

Version b9172 adds pre-built binaries for 18 platform combinations including KleidiAI-enabled Apple Silicon, CUDA 13, ROCm 7.2, and openEuler.
New WebUI fix ensures lowercase hash comparison for HuggingFace checksums, resolving compatibility issues.
llama.cpp remains the most-starred LLM inference engine on GitHub with 110K stars and 18.2K forks.

Why It Matters

llama.cpp makes powerful local LLMs accessible across nearly every platform, democratizing AI inference for developers.

Read Original Article

llama.cpp b9172 adds KleidiAI, updates all platform builds

Why It Matters

Related Articles

🚀 Stay Ahead in AI