b9060
New release enables FILL, CUMSUM, DIAG, SOLVE_TRI, SSM_SCAN, and GATED_DELTA_NET on Intel GPUs.
The latest release of llama.cpp (b9060) brings significant SYCL backend enhancements, adding six new operations: FILL, CUMSUM, DIAG, SOLVE_TRI, SSM_SCAN, and GATED_DELTA_NET. These additions allow users to leverage Intel GPUs (and other SYCL-compatible hardware) for more advanced model operations, including state-space model operations (SSM_SCAN) and gated delta networks (GATED_DELTA_NET). The release also includes fixes for the abort crash during test-backend-ops and regenerates the ops.md documentation. Developers contributed from Intel, with commits signed using GPG keys.
The project maintains its massive community adoption (109k stars, 17.9k forks) and provides pre-built binaries for macOS (Apple Silicon with optional KleidiAI, Intel, iOS XCFramework), Linux (x64, arm64, s390x, Vulkan, ROCm 7.2, OpenVINO, SYCL FP32/FP16), Android arm64, and Windows (CPU, CUDA 12/13, Vulkan, SYCL, HIP). The SYCL additions are particularly important for users on Intel hardware, enabling them to run a broader set of operations without relying on CUDA or ROCm. This reduces vendor lock-in and improves portability across heterogeneous computing environments.
- llama.cpp b9060 adds 6 SYCL operations: FILL, CUMSUM, DIAG, SOLVE_TRI, SSM_SCAN, GATED_DELTA_NET
- Fixes abort crash during test-backend-ops and adds scope_dbg_print for debugging SYCL ops
- Available on 30+ platform binary downloads including macOS, Linux, Windows, Android across CPU, CUDA, Vulkan, ROCm, SYCL backends
Why It Matters
Lets developers run advanced LLM operations on Intel GPUs, reducing CUDA dependency and expanding hardware compatibility.