Uses GL_NV_cooperative_matrix_decode_vector Vulkan extension for faster matmul on NVIDIA GPUs?

Uses GL_NV_cooperative_matrix_decode_vector Vulkan extension for faster matmul on NVIDIA GPUs

Supports 20+ platform builds including macOS, Linux, Windows, Android, and various GPU backends?

Supports 20+ platform builds including macOS, Linux, Windows, Android, and various GPU backends

Release b9367 is signed with a verified GPG key for security?

Release b9367 is signed with a verified GPG key for security

Developer Tools

llama.cpp b9367 accelerates Vulkan matmul with new decode vector extension

llama.cpp Releases May 28, 2026

⚡NVIDIA Vulkan users get faster LLM inference with cooperative matrix decode vector.

Deep Dive

ggml-org released llama.cpp b9367, optimized for Vulkan matrix multiplication using GL_NV_cooperative_matrix_decode_vector. Builds are provided for macOS (Apple Silicon and Intel), Linux (x64, arm64, Vulkan, ROCm, OpenVINO), Android (arm64), and Windows (CPU, CUDA 12/13, Vulkan, HIP). Some builds are disabled, including Linux SYCL. The release is signed with a verified GPG key.

Key Points

Uses GL_NV_cooperative_matrix_decode_vector Vulkan extension for faster matmul on NVIDIA GPUs
Supports 20+ platform builds including macOS, Linux, Windows, Android, and various GPU backends
Release b9367 is signed with a verified GPG key for security

Why It Matters

Speeds up local LLM inference for NVIDIA Vulkan users, making on-device AI more practical.

Read Original Article

llama.cpp b9367 accelerates Vulkan matmul with new decode vector extension

Why It Matters

Related Articles

🚀 Stay Ahead in AI