llama.cpp b9451 ships with expanded platform and backend support
New release now supports CUDA 13, ROCm 7.2, and OpenVINO on Linux.
The llama.cpp team has released b9451, a significant update to the lightweight, C/C++-based LLM inference engine that lets developers run models like LLaMA, Mistral, and GPT-4-class models directly on consumer hardware. This release focuses on expanding compatibility and cleaning up the codebase, notably removing unused Vulkan functions to streamline GPU inference.
The b9451 release includes pre-built binaries for a wide array of platforms: macOS (Apple Silicon and Intel), iOS, Linux (x64, arm64, s390x with CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL), Android (arm64 CPU), and Windows (x64 and arm64 CPU, CUDA 12/13, Vulkan, SYCL, HIP). The addition of CUDA 13 DLLs and ROCm 7.2 support ensures users with newer AMD GPUs can run LLMs efficiently. Builds for openEuler (a Linux distribution) also appear, though disabled by default. The release also includes refreshed UI assets, suggesting improvements to the built-in web interface. For developers, this update maintains llama.cpp's position as a go-to tool for local, cross-platform LLM inference without heavy dependencies.
- Removes unused Vulkan functions for cleaner GPU code
- Adds pre-built support for CUDA 12/13 (Windows), ROCm 7.2, OpenVINO, and SYCL
- Covers macOS, iOS, Linux, Android, Windows, and openEuler architectures
Why It Matters
llama.cpp b9451 makes state-of-the-art LLM inference easier on diverse hardware, from smartphones to datacenter GPUs.