b8839
The latest commit enables KleidiAI acceleration for Apple Silicon Macs and improves model handling.
The open-source project Llama.cpp, maintained by the ggml-org team, has rolled out a significant new release tagged b8839. This update focuses on two primary areas: core code improvements and expanded hardware acceleration. The most notable code change is a refactor of bias tensor variable names within the model architecture, specifically improving handling for models like jina-bert-v2. This refactoring, detailed in pull request #22079, enhances code maintainability and clarity for developers working on the inference engine's low-level operations.
Beyond code quality, the release delivers a major performance boost for Apple users. It now includes a dedicated build variant for "macOS Apple Silicon (arm64, KleidiAI enabled)." KleidiAI is a framework designed to optimize AI workloads on Apple's custom silicon, meaning this update can dramatically speed up running models like Llama 3 or Mistral locally on MacBooks and Mac Studios. The team has also packaged pre-compiled binaries for an extensive list of platforms, ensuring easy deployment on everything from Windows PCs with NVIDIA CUDA GPUs to Linux servers with AMD ROCm support and even Android devices.
- Adds KleidiAI acceleration for Apple Silicon Macs, promising faster local AI inference.
- Refactors bias tensor variable names (PR #22079) to improve code structure for models like jina-bert-v2.
- Provides pre-built binaries for Windows (CUDA 12/13, Vulkan), Linux (CPU/ROCm), Android, and iOS.
Why It Matters
This update makes running powerful AI models locally on Macs significantly faster and more efficient for developers and enthusiasts.