b9082
New release optimizes AI inference on Qualcomm Hexagon DSPs with L2 normalization.
The latest release of llama.cpp, version b9082, introduces L2 normalization support for the Hexagon backend. This feature adds an L2_NORM operation and a custom HVX (Hexagon Vector eXtensions) kernel, enabling efficient vector normalization on Qualcomm's Hexagon DSP. L2 normalization is essential for many AI tasks, such as embedding normalization in retrieval-augmented generation and similarity search. By offloading this operation to the DSP, llama.cpp can achieve lower latency and reduced power consumption on mobile and edge devices. The update was co-authored by Max Krasnyansky from Qualcomm, highlighting ongoing collaboration to optimize AI inference on Qualcomm hardware. With over 109,000 GitHub stars, llama.cpp remains one of the most popular frameworks for running large language models locally.
In addition to the Hexagon enhancements, release b9082 includes a broad set of pre-built binaries for multiple platforms: macOS (Apple Silicon and Intel), Linux (x64, arm64, s390x, with Vulkan, ROCm, OpenVINO, SYCL support), Windows (x64, arm64, CUDA 12/13, Vulkan, SYCL, HIP), Android (arm64 CPU), and openEuler (x86 and aarch64 with ACL Graph). This wide support ensures that developers can deploy llama.cpp across diverse environments. The addition of L2 normalization is part of a larger effort to extend Hexagon backend capabilities, following the pattern of other unary operations. As AI inference moves increasingly to edge devices, optimizations like these are critical for enabling real-time, low-power applications. The release is available on GitHub with assets for each platform.
- Adds L2 normalization (L2_NORM) operation with a custom HVX kernel for Hexagon DSP
- Co-authored by Qualcomm engineer Max Krasnyansky, reflecting partnership
- Includes pre-built binaries for macOS, Linux, Windows, Android, and openEuler with various accelerators
Why It Matters
Optimized L2 norm on Hexagon DSP enables faster, more efficient AI inference on mobile and edge devices.