llama.cpp b9825: Vulkan fix and expanded multi-platform support
New release with Vulkan operator fix for zero inputs and broader hardware compatibility...
The open-source llama.cpp project, hosted under ggml-org on GitHub, has tagged release b9825 as of June 27. This release primarily addresses a Vulkan back-end issue: the step operator now correctly handles zero-sized inputs (issue #25036), which could previously cause crashes or unexpected behavior when processing edge-case tensors. While the release notes are sparse, the included build matrix reveals significant expansion of platform coverage.
Builds now span multiple OS and hardware configurations. For macOS, Apple Silicon (arm64) is supported with and without KleidiAI (Apple's ML acceleration framework), plus Intel x64. Linux gets CPU builds for x64, arm64, and s390x, along with Vulkan, ROCm 7.2, OpenVINO, and SYCL (FP32/FP16) back-ends. Windows users gain CPU binaries for x64 and arm64, plus accelerated back-ends: CUDA 12/13, Vulkan, OpenCL (Adreno), OpenVINO, SYCL, and HIP. Android receives arm64 CPU and OpenCL Adreno builds. iOS is delivered as an XCFramework. This means developers can now deploy llama.cpp on nearly any modern compute device.
- Fixes Vulkan step operator bug for zero-length input tensors (#25036)
- Expands platform support to macOS (Apple Silicon + KleidiAI, Intel), Linux (x64/arm64/s390x with Vulkan, ROCm 7.2, OpenVINO, SYCL), Windows (x64/arm64 with CUDA 12/13, Vulkan, HIP, OpenCL), Android arm64, and iOS XCFramework
- Enables local LLM inference across consumer, server, and mobile hardware with multiple GPU back-ends
Why It Matters
llama.cpp's b9825 makes local AI inference more reliable and accessible across diverse hardware, from phones to datacenter GPUs.