llama.cpp b9367 accelerates Vulkan matmul with new decode vector extension
NVIDIA Vulkan users get faster LLM inference with cooperative matrix decode vector.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
Deep Dive
ggml-org released llama.cpp b9367, optimized for Vulkan matrix multiplication using GL_NV_cooperative_matrix_decode_vector. Builds are provided for macOS (Apple Silicon and Intel), Linux (x64, arm64, Vulkan, ROCm, OpenVINO), Android (arm64), and Windows (CPU, CUDA 12/13, Vulkan, HIP). Some builds are disabled, including Linux SYCL. The release is signed with a verified GPG key.
Key Points
- Uses GL_NV_cooperative_matrix_decode_vector Vulkan extension for faster matmul on NVIDIA GPUs
- Supports 20+ platform builds including macOS, Linux, Windows, Android, and various GPU backends
- Release b9367 is signed with a verified GPG key for security
Why It Matters
Speeds up local LLM inference for NVIDIA Vulkan users, making on-device AI more practical.