Open Source

SWE-rebench Jan 2026: GLM-5, MiniMax M2.5, Qwen3-Coder-Next, Opus 4.6, Codex Performance

The latest AI coding benchmark reveals a new leader and a surprisingly tight race at the top.

Deep Dive

The January SWE-rebench leaderboard, testing models on 48 fresh GitHub PR tasks, shows Claude Code (Opus 4.6) leading with a 52.9% resolved rate and a 70.8% pass@5 score. It's closely followed by Claude Opus 4.6 and GPT-5.2-xhigh at 51.7%, with GPT-5.2-medium at 51.0%. Among open models, Kimi K2 Thinking (43.8%), GLM-5 (42.1%), and Qwen3-Coder-Next (40.0%) lead, while MiniMax M2.5 offers strong performance at a low cost.

Why It Matters

This benchmark shows the rapid evolution of AI coding assistants, with performance gaps narrowing and cost-effective open models becoming viable contenders.