Viral Wire

MiniMax M2: 229.9B-parameter MoE with only 9.8B active params

A massive 229.9B-parameter model runs on just 9.8B per token.

Deep Dive

MiniMax introduces the MiniMax-M2 series, a family of Mixture-of-Experts language models built around the principle that mini activations can unleash maximum real-world intelligence. The flagship M2 contains 229.9B total parameters with only 9.8B activated per token. Designed end-to-end for agentic deployment, the series rests on three components: agent-driven data pipelines producing large-scale, verifiable trajectories across agentic coding and agentic cowork; Forge, a scalable agent-native RL system; and the latest M2.7 checkpoint, which takes an early step toward self-evolution—autonomously debugging training runs and modifying its own scaffold. Across M2

Key Points
  • M2 has 229.9B total parameters but activates only 9.8B per token for extreme efficiency.
  • Designed end-to-end for agentic deployment with a scalable RL system called Forge.
  • M2.7 checkpoint autonomously debugs its own training runs and modifies its scaffold.

Why It Matters

Efficient MoE design enables frontier-level agentic AI at a fraction of the compute cost.