Fixes Vulkan step operator bug for zero-length input tensors (#25036)?

Fixes Vulkan step operator bug for zero-length input tensors (#25036)

Expands platform support to macOS (Apple Silicon + KleidiAI, Intel), Linux (x64/arm64/s390x with Vulkan, ROCm 7.2, OpenVINO, SYCL), Windows (x64/arm64 with CUDA 12/13, Vulkan, HIP, OpenCL), Android arm64, and iOS XCFramework?

Expands platform support to macOS (Apple Silicon + KleidiAI, Intel), Linux (x64/arm64/s390x with Vulkan, ROCm 7.2, OpenVINO, SYCL), Windows (x64/arm64 with CUDA 12/13, Vulkan, HIP, OpenCL), Android arm64, and iOS XCFramework

Enables local LLM inference across consumer, server, and mobile hardware with multiple GPU back-ends?

Enables local LLM inference across consumer, server, and mobile hardware with multiple GPU back-ends

Developer Tools

llama.cpp b9825: Vulkan fix and expanded multi-platform support

llama.cpp Releases June 28, 2026

⚡New release with Vulkan operator fix for zero inputs and broader hardware compatibility...

Deep Dive

The open-source llama.cpp project, hosted under ggml-org on GitHub, has tagged release b9825 as of June 27. This release primarily addresses a Vulkan back-end issue: the step operator now correctly handles zero-sized inputs (issue #25036), which could previously cause crashes or unexpected behavior when processing edge-case tensors. While the release notes are sparse, the included build matrix reveals significant expansion of platform coverage.

Builds now span multiple OS and hardware configurations. For macOS, Apple Silicon (arm64) is supported with and without KleidiAI (Apple's ML acceleration framework), plus Intel x64. Linux gets CPU builds for x64, arm64, and s390x, along with Vulkan, ROCm 7.2, OpenVINO, and SYCL (FP32/FP16) back-ends. Windows users gain CPU binaries for x64 and arm64, plus accelerated back-ends: CUDA 12/13, Vulkan, OpenCL (Adreno), OpenVINO, SYCL, and HIP. Android receives arm64 CPU and OpenCL Adreno builds. iOS is delivered as an XCFramework. This means developers can now deploy llama.cpp on nearly any modern compute device.

Key Points

Fixes Vulkan step operator bug for zero-length input tensors (#25036)
Expands platform support to macOS (Apple Silicon + KleidiAI, Intel), Linux (x64/arm64/s390x with Vulkan, ROCm 7.2, OpenVINO, SYCL), Windows (x64/arm64 with CUDA 12/13, Vulkan, HIP, OpenCL), Android arm64, and iOS XCFramework
Enables local LLM inference across consumer, server, and mobile hardware with multiple GPU back-ends

Why It Matters

llama.cpp's b9825 makes local AI inference more reliable and accessible across diverse hardware, from phones to datacenter GPUs.

Read Original Article

llama.cpp b9825: Vulkan fix and expanded multi-platform support

Why It Matters

Related Articles

🚀 Stay Ahead in AI