MiniMax M2.7 is on par in most aspects against GPT 5.4 & Opus 4.6 in benchmarks 🤖
The new model achieves competitive coding scores while being dramatically cheaper to run than top-tier rivals.
MiniMax, a prominent Chinese AI lab, has released its M2.7 model, positioning it as a formidable, cost-efficient competitor to the industry's leading large language models. Benchmarks reveal the model is on par with OpenAI's GPT-5.4 and Anthropic's Claude Opus 4.6 in critical areas like coding and agentic capabilities. On the SWE Bench Pro test, M2.7 scored 56.2%, beating Google's Gemini 3.1 Pro (54.2%) and coming close to Claude Sonnet 4.6 (57.2%) and GPT-5.4 (57.7%). It also leads on the Multi-SWE Bench with a score of 52.7%. For agentic tasks—where AI can use tools and take actions—it scored 62.7% on MM-ClawBench, remaining competitive with more expensive models.
The most disruptive aspect of M2.7 is its staggering cost efficiency. While delivering comparable performance, it is priced at a fraction of its rivals. Output tokens cost $1.2 per million, which is 20.8 times cheaper than Claude Opus 4.6's $25 per million. Input tokens are 16.7 times cheaper. This price-performance ratio challenges the prevailing market dynamics, where top-tier capability has commanded a premium. The main trade-off is a context window nearly 5x smaller than Opus 4.6's, but for many cost-sensitive development and agent deployment scenarios, M2.7 presents a compelling value proposition.
A notable technical achievement highlighted by MiniMax is that M2.7 is the first model to have "deeply participated in its own self-evolution." This suggests the company used advanced reinforcement learning (RL) training loops where the AI assisted in optimizing its own architecture and training process, a cutting-edge approach in model development. This could explain its efficiency gains. The model's strong showing, particularly in coding intelligence benchmarks, establishes MiniMax as a serious player offering a viable alternative for developers and companies looking to deploy capable AI agents without the high operational costs of market leaders.
- Competes with top models: Scores 56.2% on SWE Bench Pro, near GPT-5.4 (57.7%) and Claude Opus 4.6 (57.3%).
- Unmatched cost efficiency: Output tokens are 20.8x cheaper than Claude Opus 4.6, at $1.2 vs. $25 per million.
- Self-evolved architecture: First model to use its own AI in RL training loops for self-optimization during development.
Why It Matters
It dramatically lowers the cost of deploying high-performance AI for coding and autonomous agents, increasing accessibility and competition.