Developer Tools

llama.cpp b9804 updates Mamba2 with flexible expansion factor fix

Mamba2 now supports any expansion value, not just 2x

Deep Dive

llama.cpp, the widely-used C++ inference engine for large language models, released version b9804 on June 26. The update focuses on Mamba2, a state space model architecture that offers efficient sequence processing. Previously, the code forced a fixed 2x expansion factor for the internal dimension, requiring manual workarounds for models with different configurations. The new release removes this hardcoded constraint, letting users specify any expand value. It also eliminates an invalid check that compared d_inner (inner dimension) with d_state (state dimension) — two unrelated parameters that should not be tied together.

The changes span multiple files. The core Mamba2 implementation no longer enforces a 2x expansion, and the conversion script (convert_hf_to_gguf.py) now treats the expansion factor as optional, defaulting to 2 for backward compatibility. The refactored mamba.py includes similar support. These fixes, co-authored by Sigbjørn Skjæret, ensure that users can load Mamba2 models with custom expansion ratios without encountering runtime errors. For developers working with state space models, this update makes llama.cpp more flexible and reduces friction when experimenting with different architectures.

Key Points
  • Removed hardcoded 2x expansion factor in Mamba2, now supports any expand value
  • Eliminated invalid d_inner % d_state check that caused false errors for unrelated parameters
  • Updated Hugging Face conversion scripts to accept optional expansion factor with default 2

Why It Matters

More flexible Mamba2 support enables better model adaptation and fewer conversion errors for developers.

📬 Get the top 10 AI stories daily