GPT-5.4 tops SWE-bench at 82% pass rate with 2M token context window?

GPT-5.4 tops SWE-bench at 82% pass rate with 2M token context window.

Claude Opus 4.6 leads ARC-AGI-2 reasoning benchmark at 67.3%?

Claude Opus 4.6 leads ARC-AGI-2 reasoning benchmark at 67.3%.

Gemini 3.1 Pro wins human-eval code generation at 91.4% approval; GLM-5 ranks 2nd in multilingual reasoning.

Models & Releases

Buildfastwithai May 28, 2026

⚡OpenAI's GPT-5.4, Google's Gemini 3.1 Pro, and Anthropic's Claude Opus 4.6 battle for top spots—here are the numbers.

Deep Dive

GPT-5.4, Gemini 3.1 Pro, Claude Opus 4.6, and GLM-5 are ranked by SWE-bench, ARC-AGI-2, and real-world scores in an April 2026 breakdown.

Key Points

GPT-5.4 tops SWE-bench at 82% pass rate with 2M token context window.
Claude Opus 4.6 leads ARC-AGI-2 reasoning benchmark at 67.3%.
Gemini 3.1 Pro wins human-eval code generation at 91.4% approval; GLM-5 ranks 2nd in multilingual reasoning.

Enterprise teams can now choose models optimized for coding, reasoning, or cost—no single AI fits all tasks.