Developer Tools

b8908

llama.cpp Releases April 24, 2026

⚡A negative n_discard value could crash or exploit AI inference servers.

Deep Dive

The llama.cpp project, which has over 106,000 stars on GitHub, released version b8908 on April 23, 2025. This release addresses a critical security vulnerability (CVE-2026-21869) with a CVSS score of 8.8. The bug was a heap-buffer-overflow in the server's update_slots() context-shift loop, triggered when a client sent a negative n_discard value in JSON. This is classified as CWE-787 (Out-of-bounds Write), meaning an attacker could potentially corrupt memory, crash the server, or possibly execute arbitrary code.

The fix, contributed by Georgi Gerganov, clamps n_discard to 0 at the JSON parse boundary. When n_discard is 0, the system automatically triggers a discard based on n_left/2, maintaining normal operation. The release also includes builds for multiple platforms: macOS (Apple Silicon and Intel, with optional KleidiAI acceleration), Linux (x64, arm64, s390x with various backends including Vulkan, ROCm 7.2, OpenVINO, and SYCL), Windows (x64 and arm64 with CUDA 12/13, Vulkan, SYCL, and HIP), Android (arm64), and iOS (XCFramework).

Key Points

CVE-2026-21869 is a heap-buffer-overflow with CVSS 8.8 severity, triggered by negative n_discard from client JSON
Fix clamps n_discard to 0 at JSON parse boundary, preventing memory corruption in update_slots()
Release includes builds for macOS, Linux, Windows, Android, and iOS with multiple backend options

Why It Matters

Critical security fix for the most popular open-source LLM inference engine, protecting self-hosted AI servers.

Read Original Article

b8908

Why It Matters

Stay Ahead in AI