llama.cpp v9292 fixes perplexity overflow, adds platform builds
Llama.cpp's latest release fixes a critical integer overflow in perplexity calculations.
The llama.cpp project, a highly popular open-source C++ implementation for running large language models locally, released version b9292. This release primarily addresses an integer overflow bug in the perplexity calculation, a critical metric for evaluating model performance. The fix, contributed by Stanisław Szymczyk, prevents incorrect scores when processing long sequences. The release also includes a comprehensive set of pre-built binaries covering a wide range of platforms and hardware backends.
Notably, the release includes builds for macOS Apple Silicon (both standard and with KleidiAI acceleration enabled), Intel Macs, and iOS as a XCFramework. Linux users get CPU, Vulkan, ROCm 7.2, OpenVINO, and SYCL (FP32/FP16) binaries, while Windows offers CPU (x64/arm64), CUDA 12, CUDA 13, Vulkan, SYCL, and HIP options. Android ARM64, openEuler (x86 and aarch64 with 310p and 910b variants), and ACL Graph support are also provided. This extensive distribution simplifies deployment for developers and users who need to run LLMs locally without compiling from source.
- Fixes integer overflow in perplexity calculation (PR #23496) for more accurate model evaluation.
- Provides pre-built binaries for macOS (Apple Silicon with KleidiAI, Intel, iOS), Linux (CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL), Windows (CPU, CUDA 12/13, Vulkan, SYCL, HIP), Android, and openEuler.
- Enables local LLM inference across diverse hardware without manual compilation.
Why It Matters
Streamlines local LLM deployment with broad platform support and a critical accuracy fix.