llama.cpp b9375 fixes Arm SVE bug, swaps F16 for F32 accumulation
Critical Arm SVE fix improves precision on Apple Silicon and ARM64 Linux
The llama.cpp project, a widely-used C/C++ implementation for running large language models locally, has released version b9375 with a critical fix for Arm Scalable Vector Extension (SVE) usage. The bug, identified in the vec.h and vec.cpp files, caused incorrect accumulation behavior when using SVE instructions on ARM architectures. The fix changes the accumulation type from F16 to F32, ensuring that vector operations maintain higher precision and avoid potential loss of information. This is particularly important for users running LLMs on Apple Silicon (M-series) Macs, ARM64 Linux servers, and Android devices that leverage SVE for performance gains.
The update, signed by Martin Klacer and Milos Puzovic from Arm, also brings updated build artifacts for all major platforms. Apple Silicon users get both standard and KleidiAI-enabled builds, while Linux supports x64, ARM64, and s390x with Vulkan, ROCm 7.2, OpenVINO, and SYCL backends. Windows builds include CUDA (12 and 13), Vulkan, and HIP, and Android ARM64 CPU builds are also included. While no new features are added, this precision fix is essential for developers and hobbyists running quantized models on ARM hardware, where numerical accuracy directly impacts model output quality.
- Fixes Arm SVE usage bug in vec.h/vec.cpp that caused incorrect accumulation with F16
- Changes accumulation type to F32 for higher precision on ARM platforms
- Build artifacts updated for macOS (Apple Silicon & Intel), Linux (x64, ARM64, s390x), Windows (x64, ARM64), and Android (ARM64)
Why It Matters
ARM device users (Apple Silicon, Android, ARM servers) get more accurate LLM inference without performance regressions.