b8908
A negative n_discard value could crash or exploit AI inference servers.
The llama.cpp project, which has over 106,000 stars on GitHub, released version b8908 on April 23, 2025. This release addresses a critical security vulnerability (CVE-2026-21869) with a CVSS score of 8.8. The bug was a heap-buffer-overflow in the server's update_slots() context-shift loop, triggered when a client sent a negative n_discard value in JSON. This is classified as CWE-787 (Out-of-bounds Write), meaning an attacker could potentially corrupt memory, crash the server, or possibly execute arbitrary code.
The fix, contributed by Georgi Gerganov, clamps n_discard to 0 at the JSON parse boundary. When n_discard is 0, the system automatically triggers a discard based on n_left/2, maintaining normal operation. The release also includes builds for multiple platforms: macOS (Apple Silicon and Intel, with optional KleidiAI acceleration), Linux (x64, arm64, s390x with various backends including Vulkan, ROCm 7.2, OpenVINO, and SYCL), Windows (x64 and arm64 with CUDA 12/13, Vulkan, SYCL, and HIP), Android (arm64), and iOS (XCFramework).
- CVE-2026-21869 is a heap-buffer-overflow with CVSS 8.8 severity, triggered by negative n_discard from client JSON
- Fix clamps n_discard to 0 at JSON parse boundary, preventing memory corruption in update_slots()
- Release includes builds for macOS, Linux, Windows, Android, and iOS with multiple backend options
Why It Matters
Critical security fix for the most popular open-source LLM inference engine, protecting self-hosted AI servers.