Developer Tools

b9060

llama.cpp Releases May 08, 2026

⚡New release enables FILL, CUMSUM, DIAG, SOLVE_TRI, SSM_SCAN, and GATED_DELTA_NET on Intel GPUs.

Deep Dive

The latest release of llama.cpp (b9060) brings significant SYCL backend enhancements, adding six new operations: FILL, CUMSUM, DIAG, SOLVE_TRI, SSM_SCAN, and GATED_DELTA_NET. These additions allow users to leverage Intel GPUs (and other SYCL-compatible hardware) for more advanced model operations, including state-space model operations (SSM_SCAN) and gated delta networks (GATED_DELTA_NET). The release also includes fixes for the abort crash during test-backend-ops and regenerates the ops.md documentation. Developers contributed from Intel, with commits signed using GPG keys.

The project maintains its massive community adoption (109k stars, 17.9k forks) and provides pre-built binaries for macOS (Apple Silicon with optional KleidiAI, Intel, iOS XCFramework), Linux (x64, arm64, s390x, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Android arm64, and Windows (CPU, CUDA 12/13, Vulkan, SYCL, HIP). The SYCL additions are particularly important for users on Intel hardware, enabling them to run a broader set of operations without relying on CUDA or ROCm. This reduces vendor lock-in and improves portability across heterogeneous computing environments.

Key Points

llama.cpp b9060 adds 6 SYCL operations: FILL, CUMSUM, DIAG, SOLVE_TRI, SSM_SCAN, GATED_DELTA_NET
Fixes abort crash during test-backend-ops and adds scope_dbg_print for debugging SYCL ops
Available on 30+ platform binary downloads including macOS, Linux, Windows, Android across CPU, CUDA, Vulkan, ROCm, SYCL backends

Why It Matters

Lets developers run advanced LLM operations on Intel GPUs, reducing CUDA dependency and expanding hardware compatibility.

Read Original Article

b9060

Why It Matters

Stay Ahead in AI