Fixes Arm SVE usage bug in vec.h/vec.cpp that caused incorrect accumulation with F16?

Fixes Arm SVE usage bug in vec.h/vec.cpp that caused incorrect accumulation with F16

Changes accumulation type to F32 for higher precision on ARM platforms?

Changes accumulation type to F32 for higher precision on ARM platforms

Build artifacts updated for macOS (Apple Silicon & Intel), Linux (x64, ARM64, s390x), Windows (x64, ARM64), and Android (ARM64)?

Build artifacts updated for macOS (Apple Silicon & Intel), Linux (x64, ARM64, s390x), Windows (x64, ARM64), and Android (ARM64)

Developer Tools

llama.cpp b9375 fixes Arm SVE bug, swaps F16 for F32 accumulation

llama.cpp Releases May 28, 2026

⚡Critical Arm SVE fix improves precision on Apple Silicon and ARM64 Linux

Deep Dive

The llama.cpp project, a widely-used C/C++ implementation for running large language models locally, has released version b9375 with a critical fix for Arm Scalable Vector Extension (SVE) usage. The bug, identified in the vec.h and vec.cpp files, caused incorrect accumulation behavior when using SVE instructions on ARM architectures. The fix changes the accumulation type from F16 to F32, ensuring that vector operations maintain higher precision and avoid potential loss of information. This is particularly important for users running LLMs on Apple Silicon (M-series) Macs, ARM64 Linux servers, and Android devices that leverage SVE for performance gains.

The update, signed by Martin Klacer and Milos Puzovic from Arm, also brings updated build artifacts for all major platforms. Apple Silicon users get both standard and KleidiAI-enabled builds, while Linux supports x64, ARM64, and s390x with Vulkan, ROCm 7.2, OpenVINO, and SYCL backends. Windows builds include CUDA (12 and 13), Vulkan, and HIP, and Android ARM64 CPU builds are also included. While no new features are added, this precision fix is essential for developers and hobbyists running quantized models on ARM hardware, where numerical accuracy directly impacts model output quality.

Key Points

Fixes Arm SVE usage bug in vec.h/vec.cpp that caused incorrect accumulation with F16
Changes accumulation type to F32 for higher precision on ARM platforms
Build artifacts updated for macOS (Apple Silicon & Intel), Linux (x64, ARM64, s390x), Windows (x64, ARM64), and Android (ARM64)

Why It Matters

ARM device users (Apple Silicon, Android, ARM servers) get more accurate LLM inference without performance regressions.

Read Original Article

llama.cpp b9375 fixes Arm SVE bug, swaps F16 for F32 accumulation

Why It Matters

Related Articles

🚀 Stay Ahead in AI