b8825
Latest commit enables KleidiAI for Apple Silicon and OpenVINO for Linux, boosting local LLM performance.
The open-source powerhouse behind efficient local AI, ggml-org, has pushed a significant update to its flagship llama.cpp project with commit b8825. This release focuses on expanding hardware acceleration options, most notably by introducing a dedicated macOS build for Apple Silicon (arm64) with KleidiAI enabled. KleidiAI is a performance library that can significantly speed up large language model (LLM) inference on Apple's M-series chips. Simultaneously, the update adds a new build target for Ubuntu Linux with support for Intel's OpenVINO toolkit, giving developers on x64 systems another path to optimized performance.
The commit, automatically released via GitHub Actions, also includes a technical change to use CMake's glob function to collect source files for model implementations, which can simplify the build process. The release provides pre-built binaries across a wide array of platforms including Windows (with CUDA, Vulkan, and SYCL backends), various Linux distributions, and even openEuler. This multi-platform support underscores llama.cpp's role as a universal runtime for running models like Meta's Llama 3, allowing developers to deploy the same model across different operating systems and hardware accelerators from a single codebase.
- Adds a new macOS Apple Silicon build variant with KleidiAI acceleration for faster LLM inference on M-series Macs.
- Introduces Ubuntu x64 support for Intel's OpenVINO, providing an optimized inference backend for Linux developers on Intel hardware.
- Maintains extensive cross-platform support with binaries for Windows CUDA/Vulkan, Linux ROCm, and iOS, solidifying its role as a universal LLM runtime.
Why It Matters
This lowers the barrier for running powerful LLMs locally on consumer hardware, giving developers and researchers more performant and flexible deployment options.