llama.cpp b9331 revamps CI with separate workflows for 10+ backends
New release splits CI by backend, adding Android, HIP, WebGPU support separately...
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
The latest llama.cpp release (b9331) focuses on infrastructure improvements behind the scenes. The core change is a major restructuring of the CI pipeline: previously monolithic jobs are now split into separate workflows for each backend. Specifically, the release extracts Android builds, HIP (ROCm) builds, WebGPU builds, RPC builds, s390x and PPC builds, and OpenCL builds into their own isolated workflows. This change means that pull requests that touch only, say, the GPU path no longer need to run CPU or Android tests, drastically reducing CI time and resource usage.
Beyond CI, the release ships a comprehensive set of prebuilt binaries covering nearly every major platform. macOS users get Apple Silicon (arm64) and Intel (x64) builds, plus an iOS XCFramework. Linux options span Ubuntu x64/arm64/s390x for CPU, Vulkan, ROCm 7.2, OpenVINO, and SYCL FP32. Android arm64 CPU builds are included. Windows users get CPU builds for x64 and arm64, plus CUDA 12 and 13 DLLs, Vulkan, SYCL, and HIP. openE Linux builds target x86 and aarch64 with ACL graph support. This makes llama.cpp immediately usable on anything from a desktop GPU to an edge device.
- CI pipeline now has separate workflows for Android, HIP, WebGPU, RPC, s390x/PPC, and OpenCL backends, speeding up PR testing
- Prebuilt binaries shipped for 17+ platform/backend combinations including CUDA 12/13, ROCm 7.2, Vulkan, SYCL, and ACL
- Release includes macOS Apple Silicon (with KleidiAI), Intel, iOS XCFramework, and Android arm64 CPU builds
Why It Matters
Faster CI means quicker updates for llama.cpp, expanding local LLM deployment across diverse hardware.