MiniMax M2: 229.9B-parameter MoE with only 9.8B active params
A massive 229.9B-parameter model runs on just 9.8B per token.
MiniMax introduces the MiniMax-M2 series, a family of Mixture-of-Experts language models built around the principle that mini activations can unleash maximum real-world intelligence. The flagship M2 contains 229.9B total parameters with only 9.8B activated per token. Designed end-to-end for agentic deployment, the series rests on three components: agent-driven data pipelines producing large-scale, verifiable trajectories across agentic coding and agentic cowork; Forge, a scalable agent-native RL system; and the latest M2.7 checkpoint, which takes an early step toward self-evolution—autonomously debugging training runs and modifying its own scaffold. Across M2
- M2 has 229.9B total parameters but activates only 9.8B per token for extreme efficiency.
- Designed end-to-end for agentic deployment with a scalable RL system called Forge.
- M2.7 checkpoint autonomously debugs its own training runs and modifies its scaffold.
Why It Matters
Efficient MoE design enables frontier-level agentic AI at a fraction of the compute cost.