Developer Tools

b8999

llama.cpp Releases May 02, 2026

⚡Critical fix for --tensor-type when default qtype is overridden.

Deep Dive

The open-source llama.cpp project, known for running large language models locally, has released version b8999. This patch addresses a regression introduced by the developer (my fault) where the `--tensor-type` flag did not properly apply when the default quantization type was overridden. The fix was contributed by Anai-Guo (re-submitted under a new policy) and closes issue #22544.

Alongside the bug fix, b8999 provides pre-built binaries for a wide range of platforms: macOS (Apple Silicon and Intel, with optional KleidiAI acceleration), Linux (CPU, GPU via Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16, and s390x), Windows (CPU, CUDA 12/13, Vulkan, SYCL, HIP), Android (arm64 CPU), and openEuler with Ascend NPU support. This ensures users can quickly deploy the fix across diverse hardware.

Key Points

Fixes a bug where --tensor-type was ignored when default qtype was overridden
Patch submitted by Anai-Guo, re-submitted under new contributor policy
Pre-built binaries for 20+ platform variants including CUDA, Vulkan, ROCm, and Ascend

Why It Matters

Maintains reliability for local LLM deployments with custom quantization settings across all major platforms.

Read Original Article

b8999

Why It Matters

Stay Ahead in AI