llama.cpp b9239 ships verbosity fix and broader platform support
New release supports macOS, Windows, Linux, and Android with GPU backends.
The b9239 release of llama.cpp, the popular C++ inference engine for LLaMA-family models, focuses on a quality-of-life fix: the --fit verbosity flag now works correctly when --verbosity is set to 4 (issue #23282). This addresses a bug that could cause overly verbose or incomplete output for users fine-tuning model memory allocation.
More notably, the release ships precompiled artifacts for an extensive range of hardware and operating systems: macOS (Apple Silicon with and without KleidiAI, Intel x64, iOS XCFramework), Linux (x64 and arm64 CPUs, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Windows (CPU, arm64 CPU, CUDA 12 & 13, Vulkan, SYCL, HIP), Android arm64, and even openEuler on x86 and aarch64 (with 310p and 910b ACL Graph). This breadth of support reinforces llama.cpp's position as the go-to tool for running large language models locally across diverse hardware setups, from gaming PCs to edge devices.
- Fixes --fit verbosity flag behavior when combined with --verbosity 4
- Provides prebuilt binaries for 20+ platform/backend combinations including CUDA 12/13, ROCm, Vulkan, and KleidiAI
- Supports macOS, Windows, Linux, Android, iOS, and openEuler architectures
Why It Matters
Enables developers to run LLMs locally on any device, reducing cloud dependency and latency.