Developer Tools

b9002

New build supports Apple Silicon KleidiAI, CUDA 13, and more platforms

Deep Dive

The llama.cpp open-source project has released version b9002, a significant update that introduces new hardware acceleration options and platform builds. Most notably, the release adds KleidiAI support for Apple Silicon (arm64) devices, enabling faster inference on macOS and iOS through optimized neural network kernels. For Windows users, the release packages separate CUDA 12 and CUDA 13 DLLs, along with Vulkan, SYCL, and HIP (AMD) builds. Linux gains new builds for Vulkan, ROCm 7.2, OpenVINO, and both FP32/FP16 SYCL variants, plus support for s390x architecture.

Beyond new GPU backends, the release expands platform coverage to include Android arm64 CPU builds and multiple openEuler variants for Huawei Ascend NPUs (310p and 910b with ACL Graph). The earlier macOS Intel (x64) build remains supported. All builds are available as downloadable assets from the GitHub release page. This release demonstrates the community's commitment to making local LLM inference accessible across diverse hardware ecosystems, from consumer laptops to enterprise servers.

Key Points
  • New KleidiAI acceleration for Apple Silicon (macOS/iOS) improves inference speed
  • Windows adds separate CUDA 12 and CUDA 13 DLLs plus Vulkan, SYCL, and HIP builds
  • Linux gains ROCm 7.2, OpenVINO, SYCL FP32/FP16, and s390x CPU support

Why It Matters

Developers can now run LLMs locally on more devices with optimized performance across CPUs and GPUs.