New GGML_OP_PAD HVX kernel for Hexagon HTP backend (PR #23078)?

New GGML_OP_PAD HVX kernel for Hexagon HTP backend (PR #23078)

Part of llama.cpp b9221 release with multi-platform binary builds?

Part of llama.cpp b9221 release with multi-platform binary builds

Developer Tools

llama.cpp b9221 adds PAD op with Hexagon HVX kernels

llama.cpp Releases May 19, 2026

⚡New vectorized padding supports zero-padding and circular padding across all 4 tensor dimensions.

Deep Dive

The latest release of llama.cpp (b9221) introduces a significant backend improvement: a PAD operation kernel for Qualcomm's Hexagon HTP (Hexagon Tensor Processor) architecture, leveraging HVX (Hexagon Vector eXtensions) vectorized instructions. This PR (pull request #23078) implements GGML_OP_PAD with support for both zero-padding and circular padding across all four tensor dimensions, a critical operation for many neural network layers that require dimension alignment or shape manipulation.

This addition is particularly important for deploying large language models on edge devices using Qualcomm's Hexagon DSP (Digital Signal Processor), as it allows more efficient tensor manipulation directly on the accelerator without CPU fallbacks. The release also includes cleanup of duplicate op cases and macro alignment fixes. Binary releases are available for multiple platforms including macOS (Apple Silicon and Intel), Linux (x64, ARM64, with Vulkan, ROCm, OpenVINO, SYCL support), Android (ARM64), Windows (CPU, CUDA 12/13, Vulkan, SYCL, HIP), and openEuler with Ascend NPU support.

Key Points

New GGML_OP_PAD HVX kernel for Hexagon HTP backend (PR #23078)
Supports zero-padding and circular padding across all 4 tensor dimensions
Part of llama.cpp b9221 release with multi-platform binary builds

Why It Matters

Enables efficient LLM inference on Qualcomm Hexagon processors, improving edge AI performance for on-device models.

Read Original Article

llama.cpp b9221 adds PAD op with Hexagon HVX kernels

Why It Matters

Related Articles

🚀 Stay Ahead in AI