Llama.cpp b9097 expands GPU support across Apple, AMD, Intel, and NVIDIA
New release adds KleidiAI for Apple Silicon and ROCm 7.2 for AMD GPUs...
Deep Dive
The ggml-org team released llama.cpp version b9097, an update to the C++ inference engine for LLMs. This release includes builds for macOS Apple Silicon (with KleidiAI enabled), macOS Intel, iOS XCFramework, Linux (x64 and arm64 CPU, s390x, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Android arm64 CPU, Windows (x64 and arm64 CPU, CUDA 12 & 13, Vulkan, SYCL, HIP), and openEuler (x86 and aarch64 with various backends).
Key Points
- Adds KleidiAI acceleration for Apple Silicon (arm64) improving inference speed on Macs and iPhones
- Supports ROCm 7.2 on Linux and CUDA 12/13 on Windows for AMD and NVIDIA GPUs respectively
- Includes Android arm64, openEuler with ACL Graph, and Vulkan across x64/arm64 Linux
Why It Matters
Llama.cpp b9097 lets developers run LLMs on nearly any device, from phones to datacenters, without cloud costs.