SWE-rebench Jan 2026: GLM-5, MiniMax M2.5, Qwen3-Coder-Next, Opus 4.6, Codex Performance
The latest AI coding benchmark reveals a new leader and a surprisingly tight race at the top.
Deep Dive
The January SWE-rebench leaderboard, testing models on 48 fresh GitHub PR tasks, shows Claude Code (Opus 4.6) leading with a 52.9% resolved rate and a 70.8% pass@5 score. It's closely followed by Claude Opus 4.6 and GPT-5.2-xhigh at 51.7%, with GPT-5.2-medium at 51.0%. Among open models, Kimi K2 Thinking (43.8%), GLM-5 (42.1%), and Qwen3-Coder-Next (40.0%) lead, while MiniMax M2.5 offers strong performance at a low cost.
Why It Matters
This benchmark shows the rapid evolution of AI coding assistants, with performance gaps narrowing and cost-effective open models becoming viable contenders.