llama.cpp b9394 fixes n_head_kv default for better model compat
New build makes multi-query attention configuration automatic for local LLMs.
Deep Dive
llama.cpp b9394 (by ggml-org) updates the mtmd backend to default n_head_kv to n_head (commit #23782).
Key Points
- Commit #23782 in b9394 defaults n_head_kv to n_head for mtmd backend, fixing head mismatch issues.
- First complete build matrix includes macOS, Linux, Windows, Android, and openEuler with GPU backends.
- Reduces friction for running multi-query and grouped-query attention models locally.
Why It Matters
Simplifies local LLM deployment by automatically handling non-standard attention head counts.