Developer Tools

llama.cpp b9487 updates BoringSSL, adds KleidiAI for Apple Silicon

Popular local LLM runtime gets critical security patch and Apple Silicon speed boost.

Deep Dive

The latest release (b9487) of llama.cpp — the wildly popular C/C++ inference engine for LLaMA-family models — delivers two headline improvements: an updated BoringSSL library (v0.20260526.0) for enhanced security, and native KleidiAI support on macOS Apple Silicon for accelerated neural compute. This matters because llama.cpp is the most widely used tool for running large language models like Llama 3, Mistral, and Gemma entirely on-device, often on consumer hardware.

The release also expands its already extensive platform coverage. Prebuilt binaries are now available for Linux (x64 and arm64 with CPU, Vulkan, ROCm 7.2, OpenVINO, and SYCL), Windows (x64 and arm64 with CPU, CUDA 12/13, Vulkan, HIP), macOS (Intel x64 and Apple Silicon with optional KleidiAI), and Android arm64. These builds make it trivial for developers to integrate local LLM inference into apps without compiling from source. The repository, with 114K stars and 19K forks, continues to dominate the local AI ecosystem by prioritizing performance and cross-platform compatibility.

Key Points
  • Updated BoringSSL to 0.20260526.0 for security hardening
  • KleidiAI acceleration now available on macOS Apple Silicon (arm64)
  • Prebuilt binaries for 9+ platform/backend combos including Vulkan, CUDA 12/13, ROCm, and Android

Why It Matters

Secure, fast local LLM inference across every major platform keeps AI private and offline-ready.