Adds KleidiAI acceleration for Apple Silicon (arm64) improving inference speed on Macs and iPhones?

Adds KleidiAI acceleration for Apple Silicon (arm64) improving inference speed on Macs and iPhones

Supports ROCm 7.2 on Linux and CUDA 12/13 on Windows for AMD and NVIDIA GPUs respectively?

Supports ROCm 7.2 on Linux and CUDA 12/13 on Windows for AMD and NVIDIA GPUs respectively

Includes Android arm64, openEuler with ACL Graph, and Vulkan across x64/arm64 Linux?

Includes Android arm64, openEuler with ACL Graph, and Vulkan across x64/arm64 Linux

Developer Tools

Llama.cpp b9097 expands GPU support across Apple, AMD, Intel, and NVIDIA

llama.cpp Releases May 11, 2026

⚡New release adds KleidiAI for Apple Silicon and ROCm 7.2 for AMD GPUs...

Deep Dive

The ggml-org team released llama.cpp version b9097, an update to the C++ inference engine for LLMs. This release includes builds for macOS Apple Silicon (with KleidiAI enabled), macOS Intel, iOS XCFramework, Linux (x64 and arm64 CPU, s390x, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Android arm64 CPU, Windows (x64 and arm64 CPU, CUDA 12 & 13, Vulkan, SYCL, HIP), and openEuler (x86 and aarch64 with various backends).

Key Points

Adds KleidiAI acceleration for Apple Silicon (arm64) improving inference speed on Macs and iPhones
Supports ROCm 7.2 on Linux and CUDA 12/13 on Windows for AMD and NVIDIA GPUs respectively
Includes Android arm64, openEuler with ACL Graph, and Vulkan across x64/arm64 Linux

Why It Matters

Llama.cpp b9097 lets developers run LLMs on nearly any device, from phones to datacenters, without cloud costs.

Read Original Article

Llama.cpp b9097 expands GPU support across Apple, AMD, Intel, and NVIDIA

Why It Matters

Related Articles

🚀 Stay Ahead in AI