llama.cpp b9399 refines OpenCL backend, expands builds to 20+ platforms
New release brings cleaner OpenCL code and optimized builds for Apple Silicon with KleidiAI.
The llama.cpp project, a C/C++ implementation of LLM inference optimized for local and edge devices, has released version b9399. This incremental update primarily targets the OpenCL backend for GPU acceleration, moving backend info printing into a dedicated function (commit 408ae2b). While the change is structural rather than performance-focused, it improves code maintainability and sets the stage for future OpenCL enhancements. The release is signed with a verified GPG key by GitHub Actions, ensuring integrity across all builds.
Notably, b9399 expands official binary availability to over 20 platform configurations. For Apple users, builds include macOS Apple Silicon (arm64) with optional KleidiAI acceleration, macOS Intel (x64), and iOS XCFramework. Linux support spans x64, arm64, and s390x with Vulkan, ROCm 7.2, OpenVINO, and SYCL backends. Windows users get CPU, CUDA 12/13, Vulkan, and HIP builds. Android arm64 and openEuler (x86/aarch64) are also included. As the most-starred LLM inference engine on GitHub, llama.cpp empowers developers to run models like Llama, Mistral, and Gemma locally without cloud dependencies.
- Refactored OpenCL backend by moving info printing into its own function (commit 408ae2b) for cleaner code structure.
- Official builds now support 20+ platform configurations including Apple Silicon with KleidiAI, Linux with ROCm 7.2, and Windows with CUDA 13.
- Release is GPG-signed by GitHub Actions; project maintains 114k stars and 18.9k forks, reflecting strong community trust.
Why It Matters
Developers running LLMs locally gain better GPU stability and broader platform support for deployment.