llama.cpp b9365 improves CI with ARM self-hosted runners, disables KleidiAI on macOS
Popular open-source LLM runtime shifts ARM builds to self-hosted, dropping KleidiAI on Mac.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
The latest release of llama.cpp (b9365) from ggml-org focuses on infrastructure improvements rather than new model features. The core change relocates ARM build jobs from GitHub-hosted runners to self-hosted hardware—a move that likely improves reliability and speed for macOS and iOS ARM64 builds. Specifically, the ARM macOS release with KleidiAI (a matrix multiplication library) is now disabled, while the standard Apple Silicon (arm64) build remains active.
Platform coverage spans all major OSes: Linux x64/ARM64/s390x with CPU, Vulkan, ROCm 7.2, and OpenVINO backends; Windows x64/ARM64 with CPU, CUDA 12 & 13, Vulkan, and HIP; plus Android ARM64 and iOS XCFramework. The release also includes updates to UI assets and fixes for dependency linking. With 113k GitHub stars and 18.9k forks, llama.cpp remains the go-to open-source runtime for running large language models locally on consumer hardware. This incremental update ensures the build pipeline stays maintainable as the project scales.
- ARM macOS and Linux builds now use self-hosted runners instead of GitHub-hosted ones
- KleidiAI-accelerated macOS release disabled; standard Apple Silicon (arm64) build still available
- Full platform support: Linux (CPU/Vulkan/ROCm/OpenVINO), Windows (CPU/CUDA 12&13/Vulkan/HIP), Android, iOS
Why It Matters
llama.cpp's infrastructure update ensures faster, more reliable builds for local LLM inference across all major platforms.