M2 has 229.9B total parameters but activates only 9.8B per token for extreme efficiency?

M2 has 229.9B total parameters but activates only 9.8B per token for extreme efficiency.

Designed end-to-end for agentic deployment with a scalable RL system called Forge?

Designed end-to-end for agentic deployment with a scalable RL system called Forge.

M2.7 checkpoint autonomously debugs its own training runs and modifies its scaffold?

M2.7 checkpoint autonomously debugs its own training runs and modifies its scaffold.

Viral Wire

MiniMax M2: 229.9B-parameter MoE with only 9.8B active params

YouTube (AI Paper Slob), arXiv May 31, 2026

⚡A massive 229.9B-parameter model runs on just 9.8B per token.

Deep Dive

MiniMax introduces the MiniMax-M2 series, a family of Mixture-of-Experts language models built around the principle that mini activations can unleash maximum real-world intelligence. The flagship M2 contains 229.9B total parameters with only 9.8B activated per token. Designed end-to-end for agentic deployment, the series rests on three components: agent-driven data pipelines producing large-scale, verifiable trajectories across agentic coding and agentic cowork; Forge, a scalable agent-native RL system; and the latest M2.7 checkpoint, which takes an early step toward self-evolution—autonomously debugging training runs and modifying its own scaffold. Across M2

Key Points

M2 has 229.9B total parameters but activates only 9.8B per token for extreme efficiency.
Designed end-to-end for agentic deployment with a scalable RL system called Forge.
M2.7 checkpoint autonomously debugs its own training runs and modifies its scaffold.

Why It Matters

Efficient MoE design enables frontier-level agentic AI at a fraction of the compute cost.

Read Original Article

MiniMax M2: 229.9B-parameter MoE with only 9.8B active params

Why It Matters

Related Articles

🚀 Stay Ahead in AI