Developer Tools

llama.cpp b9367 accelerates Vulkan matmul with new decode vector extension

NVIDIA Vulkan users get faster LLM inference with cooperative matrix decode vector.

Deep Dive

ggml-org released llama.cpp b9367, optimized for Vulkan matrix multiplication using GL_NV_cooperative_matrix_decode_vector. Builds are provided for macOS (Apple Silicon and Intel), Linux (x64, arm64, Vulkan, ROCm, OpenVINO), Android (arm64), and Windows (CPU, CUDA 12/13, Vulkan, HIP). Some builds are disabled, including Linux SYCL. The release is signed with a verified GPG key.

Key Points
  • Uses GL_NV_cooperative_matrix_decode_vector Vulkan extension for faster matmul on NVIDIA GPUs
  • Supports 20+ platform builds including macOS, Linux, Windows, Android, and various GPU backends
  • Release b9367 is signed with a verified GPG key for security

Why It Matters

Speeds up local LLM inference for NVIDIA Vulkan users, making on-device AI more practical.