Developer Tools

llama.cpp b9265: Hexagon SSM-Conv fixes boost large prompt performance

New release optimizes SSM convolution on Hexagon for large prompts with VTCM fixes

Deep Dive

The latest llama.cpp release, tagged b9265, focuses on performance and stability improvements for Hexagon DSP, specifically in the SSM convolution (ssm-conv) backend. Key changes include a fix for large prompts that previously caused errors, removal of gathers for more efficient memory access, and better handling of VTCM (Vector Tile Co-processor Memory). The developers also relaxed gating requirements for ssm-conv operations and added a new prefill backend test to validate these changes. Additionally, the rope_cache_init function was uninlined to prevent breakage after rebasing with SSM_CONV modifications.

This release continues llama.cpp's tradition of broad platform support. Prebuilt binaries are available for macOS (Apple Silicon with and without KleidiAI, Intel, iOS XCFramework), Linux (x64/arm64 CPU, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Android (arm64 CPU), Windows (x64/arm64 CPU, CUDA 12.4, CUDA 13.1, Vulkan, SYCL, HIP), and openEuler (x86 and aarch64 with 310p/910b ACL Graph). These fixes are particularly important for developers deploying large language models on edge devices using Qualcomm Hexagon processors.

Key Points
  • Fixes SSM convolution errors when processing large prompts on Hexagon architecture
  • Removes gather operations and improves VTCM memory management for better efficiency
  • Adds a new prefill backend test to validate Hexagon SSM conv performance

Why It Matters

Enables more reliable and efficient LLM inference on Qualcomm Hexagon hardware for edge AI applications.