Supports Apple Silicon (arm64) with optional KleidiAI acceleration for faster inference?

Supports Apple Silicon (arm64) with optional KleidiAI acceleration for faster inference

Windows builds include CUDA 12 and 13 DLLs, plus Vulkan, SYCL, and HIP for AMD GPUs?

Windows builds include CUDA 12 and 13 DLLs, plus Vulkan, SYCL, and HIP for AMD GPUs

openEuler support for Ascend 310p and 910b processors, targeting enterprise AI deployments?

openEuler support for Ascend 310p and 910b processors, targeting enterprise AI deployments

Developer Tools

llama.cpp b9186 adds broad platform support and CUDA 12/13 backends

llama.cpp Releases May 17, 2026

⚡New release supports Apple Silicon, Windows, Linux, Android, and more...

Deep Dive

ggml-org has released llama.cpp b9186, the latest version of their popular C/C++ LLM inference engine. This release syncs with the ggml library and brings extensive platform support, making it one of the most versatile options for running large language models locally. The update includes prebuilt binaries for macOS (Apple Silicon with optional KleidiAI acceleration, Intel, and iOS XCFramework), Linux (Ubuntu for x64, arm64, and s390x with Vulkan, ROCm 7.2, OpenVINO, and SYCL backends), Android arm64, and Windows (x64 and arm64 CPU, plus CUDA 12 and 13, Vulkan, SYCL, and HIP for AMD GPUs). Notably, openEuler support for Ascend processors is also included.

For developers and AI enthusiasts, b9186 lowers the barrier to running models like LLaMA, Mistral, and others on diverse hardware. The inclusion of CUDA 12 and 13 DLLs ensures compatibility with modern NVIDIA GPUs, while Vulkan and SYCL extend support to a wide range of graphics cards. The openEuler builds cater to enterprise users with Ascend NPUs. This release continues llama.cpp's mission to make on-device AI accessible, with performance optimized for each platform. Assets are available for direct download on the GitHub release page.

Key Points

Supports Apple Silicon (arm64) with optional KleidiAI acceleration for faster inference
Windows builds include CUDA 12 and 13 DLLs, plus Vulkan, SYCL, and HIP for AMD GPUs
openEuler support for Ascend 310p and 910b processors, targeting enterprise AI deployments

Why It Matters

llama.cpp b9186 enables local LLM inference on nearly any modern device, democratizing AI across platforms.

Read Original Article

llama.cpp b9186 adds broad platform support and CUDA 12/13 backends

Why It Matters

Related Articles

🚀 Stay Ahead in AI