b9087
New release boosts LLM performance on Intel GPUs with reordered MMVQ paths.
The llama.cpp community released b9087, a new tag that brings targeted optimizations for Intel SYCL (a cross-platform programming model for heterogeneous computing). The key changes, contributed by Intel engineer Chun Tao, include Q5_K reorder MMVQ/dequant and Q8_0 reorder MMVQ paths. These optimizations streamline the matrix-vector multiplication and dequantization steps, which are critical for fast LLM inference on Intel GPUs and CPUs. The commit also removes duplicate definitions to clean up the codebase.
The release builds on supporting multiple environments: macOS (Apple Silicon and Intel, with optional KleidiAI), Linux (x64, arm64, s390x, plus Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Windows (x64/arm64 CPU, CUDA 12/13, Vulkan, SYCL, HIP), Android (arm64 CPU), and openEuler (x86 and aarch64 with ACL Graph). This broad support ensures that developers using Intel hardware can leverage these SYCL speedups across various deployment targets.
- Intel's Chun Tao contributed Q5_K and Q8_0 reorder MMVQ/dequant optimizations for SYCL paths.
- Removed duplicate definitions to improve code quality and maintainability.
- Supports 30+ build assets across macOS, Linux, Windows, Android, and openEuler with multiple backends.
Why It Matters
llama.cpp's SYCL optimizations make LLM inference more efficient on Intel hardware, broadening deployment options for developers.