Developer Tools

llama.cpp b9239 ships verbosity fix and broader platform support

New release supports macOS, Windows, Linux, and Android with GPU backends.

Deep Dive

The b9239 release of llama.cpp, the popular C++ inference engine for LLaMA-family models, focuses on a quality-of-life fix: the --fit verbosity flag now works correctly when --verbosity is set to 4 (issue #23282). This addresses a bug that could cause overly verbose or incomplete output for users fine-tuning model memory allocation.

More notably, the release ships precompiled artifacts for an extensive range of hardware and operating systems: macOS (Apple Silicon with and without KleidiAI, Intel x64, iOS XCFramework), Linux (x64 and arm64 CPUs, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Windows (CPU, arm64 CPU, CUDA 12 & 13, Vulkan, SYCL, HIP), Android arm64, and even openEuler on x86 and aarch64 (with 310p and 910b ACL Graph). This breadth of support reinforces llama.cpp's position as the go-to tool for running large language models locally across diverse hardware setups, from gaming PCs to edge devices.

Key Points
  • Fixes --fit verbosity flag behavior when combined with --verbosity 4
  • Provides prebuilt binaries for 20+ platform/backend combinations including CUDA 12/13, ROCm, Vulkan, and KleidiAI
  • Supports macOS, Windows, Linux, Android, iOS, and openEuler architectures

Why It Matters

Enables developers to run LLMs locally on any device, reducing cloud dependency and latency.