New cublasSgemmBatched mapping for HIP/MUSA vendor headers enhances CUDA-like performance on AMD GPUs?

New cublasSgemmBatched mapping for HIP/MUSA vendor headers enhances CUDA-like performance on AMD GPUs

Prebuilt binaries for macOS, Linux, Windows, Android, and more, including ROCm, Vulkan, and SYCL support?

Prebuilt binaries for macOS, Linux, Windows, Android, and more, including ROCm, Vulkan, and SYCL support

Release signed with verified GPG key from ggml-org, ensuring integrity and community trust?

Release signed with verified GPG key from ggml-org, ensuring integrity and community trust

Developer Tools

llama.cpp b9810 adds CUDA support for AMD GPUs with new mapping

llama.cpp Releases June 27, 2026

⚡New release brings batched matrix multiplication support to HIP/MUSA on AMD GPUs

Deep Dive

The ggml-org team released llama.cpp version b9810 on June 26, 2024. This update focuses on expanding GPU support by adding a cublasSgemmBatched mapping for HIP/MUSA vendor headers. HIP (Heterogeneous-Compute Interface for Portability) and MUSA (a vendor-specific extension) allow AMD GPUs to run CUDA-like code. The batched SGEMM (Single-precision GEneral Matrix Multiply) mapping enables efficient handling of multiple small matrix multiplications in parallel, which is critical for transformer-based LLM inference. This move directly improves performance on AMD hardware, reducing the gap with NVIDIA GPUs for running models like LLaMA locally.

The release also includes prebuilt binaries for a wide range of platforms: macOS Apple Silicon (with optional KleidiAI acceleration), macOS Intel, Linux on x64/arm64/s390x with CPU, Vulkan, ROCm 7.2, OpenVINO, and SYCL; Windows x64 and arm64 with CPU, OpenCL Adreno, CUDA 12/13, Vulkan, OpenVINO, SYCL, and HIP; plus Android arm64 CPU and openEuler builds. The commit is signed and verified. This broad support reinforces llama.cpp's role as the go-to tool for developer-run, local AI inference across diverse hardware ecosystems.

Key Points

New cublasSgemmBatched mapping for HIP/MUSA vendor headers enhances CUDA-like performance on AMD GPUs
Prebuilt binaries for macOS, Linux, Windows, Android, and more, including ROCm, Vulkan, and SYCL support
Release signed with verified GPG key from ggml-org, ensuring integrity and community trust

Why It Matters

Expands local LLM inference to AMD GPUs, reducing reliance on NVIDIA and democratizing AI hardware choices.

Read Original Article

llama.cpp b9810 adds CUDA support for AMD GPUs with new mapping

Why It Matters

Related Articles

🚀 Stay Ahead in AI