Commit #23782 in b9394 defaults n_head_kv to n_head for mtmd backend, fixing head mismatch issues?

Commit #23782 in b9394 defaults n_head_kv to n_head for mtmd backend, fixing head mismatch issues.

First complete build matrix includes macOS, Linux, Windows, Android, and openEuler with GPU backends?

First complete build matrix includes macOS, Linux, Windows, Android, and openEuler with GPU backends.

Reduces friction for running multi-query and grouped-query attention models locally.

Developer Tools

llama.cpp Releases May 29, 2026

⚡New build makes multi-query attention configuration automatic for local LLMs.

Deep Dive

llama.cpp b9394 (by ggml-org) updates the mtmd backend to default n_head_kv to n_head (commit #23782).

Key Points

Commit #23782 in b9394 defaults n_head_kv to n_head for mtmd backend, fixing head mismatch issues.
First complete build matrix includes macOS, Linux, Windows, Android, and openEuler with GPU backends.
Reduces friction for running multi-query and grouped-query attention models locally.

Simplifies local LLM deployment by automatically handling non-standard attention head counts.