Developer Tools

llama.cpp b9243 adds Hexagon MROPE/IMROPE for on-device AI

New release brings advanced rotary position embedding support to Qualcomm Hexagon DSPs.

Deep Dive

The llama.cpp project, known for running large language models locally on consumer hardware, has shipped version b9243. This release focuses on enhancing Hexagon DSP support by adding MROPE (Multi-head Rotary Position Embedding) and IMROPE (Interleaved Multi-head Rotary Position Embedding) operations within the HTP (Hardware Tensor Processor) rope op. These embeddings are critical for transformer-based AI models, and optimizing them on Qualcomm's Hexagon architecture directly accelerates inference on Snapdragon-powered phones, cars, and edge devices.

Beyond the Hexagon improvements, b9243 broadens platform coverage significantly. It provides binaries for macOS (Apple Silicon with and without KleidiAI, plus Intel), Linux (x64, arm64, s390x, with Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Windows (x64 and arm64 CPU, plus CUDA 12/13, Vulkan, SYCL, HIP), Android (arm64 CPU), iOS (XCFramework), and openEuler (x86 and aarch64 with compute acceleration). This makes it one of the most aggressively packaged updates, ensuring developers can deploy locally on virtually any modern hardware.

The update also includes general robustness fixes, though the headline feature is the Hexagon rope optimizations. For developers building edge AI applications, this means lower latency and power consumption on Qualcomm platforms, bringing advanced LLM capabilities to mobile and even embedded scenarios. The project continues to thrive with over 112k stars on GitHub, reflecting massive community interest in democratizing AI inference.

Key Points
  • New MROPE and IMROPE support in Hexagon HTP rope op for efficient rotary position embeddings on Qualcomm DSPs
  • Cross-platform builds include macOS (Apple Silicon/Intel), Windows (CPU/CUDA/Vulkan/SYCL), Linux (x64/arm64/ROCm/OpenVINO), Android, iOS, and openEuler
  • Version b9243 lowers latency and power draw for LLM inference on mobile and edge devices using Hexagon hardware

Why It Matters

Enables faster, more efficient local AI on Qualcomm-powered devices, expanding edge deployment of large language models.