b8918
New release brings KleidiAI support and fixes across 20+ build targets.
The llama.cpp project, spearheaded by ggml-org, has released version b8918, a maintenance update that refines coding style and expands platform compatibility. This release is a minor bump but critical for developers relying on local LLM inference across heterogeneous environments. The commit, signed with GitHub's verified signature, emphasizes stability and broad hardware support, a hallmark of the open-source project that has garnered 106k stars and 17.3k forks.
Key highlights include pre-built binaries for macOS (Apple Silicon arm64 with and without KleidiAI, Intel x64), Linux (x64, arm64, s390x, with Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Windows (x64, arm64, with CUDA 12/13, Vulkan, SYCL, HIP), Android (arm64), and iOS. The addition of KleidiAI for Apple Silicon and openEuler builds (x86 and aarch64 with ACL Graph) demonstrates a commitment to edge and enterprise deployments. This release enables users to run large language models (LLMs) like LLaMA, Mistral, and others locally with optimized performance on nearly any modern hardware, from smartphones to data center GPUs.
- Supports 20+ build targets including macOS, Linux, Windows, Android, and iOS.
- Adds KleidiAI for Apple Silicon, ROCm 7.2, and openEuler with ACL Graph.
- Includes CUDA 12/13, Vulkan, SYCL, and HIP for GPU acceleration.
Why It Matters
Enables local LLM inference on diverse hardware, democratizing AI access for developers and enterprises.