Models & Releases

Gemini 3.1 Pro Crushes Benchmarks – 77% on ARC-AGI-2, Doubles Predecessor!

Adam Holter March 04, 2026

⚡Gemini 3.1 Pro scores 77% on ARC-AGI-2, while Claude Opus 4.6 and GPT-5.3 Codex bring major speed and reasoning upgrades.

Deep Dive

February 2026 was a landmark month for frontier AI models, headlined by Google DeepMind's Gemini 3.1 Pro. The natively multimodal reasoning model, now in preview, delivered a stunning 77.1% score on the challenging ARC-AGI-2 benchmark, more than doubling the performance of Gemini 3 Pro. It also set a new record on the GPQA Diamond benchmark at 94.3%. Priced at $2 per million input tokens, it signals Google's aggressive push in the high-reasoning model space, with general availability coming soon and the older Gemini 3 Pro Preview being discontinued on March 9.

The competitive landscape intensified with Anthropic's Claude Opus 4.6, which boasts a 1M token context and a 144-point lead over GPT-5.2 in knowledge work Elo ratings. Meanwhile, OpenAI countered with GPT-5.3 Codex, offering a 25% speed increase and 48% token efficiency gain over GPT-5.2-Codex, alongside a new 'high capability' classification in cybersecurity. Other notable releases included xAI's Grok 4.20 with a parallel agents architecture and continued strong showings from Chinese labs like Zhipu AI's GLM-5 and Alibaba's Qwen 3.5, keeping pressure on both the open-source and proprietary frontiers.

Key Points

Gemini 3.1 Pro scores 77.1% on ARC-AGI-2, more than double its predecessor's 31.1%.
Claude Opus 4.6 offers a 1M token context and leads GPT-5.2 by 144 Elo points on knowledge work.
GPT-5.3 Codex runs 25% faster and uses 48% fewer tokens than GPT-5.2-Codex for a 2.6x throughput gain.

Why It Matters

These leaps in reasoning, speed, and efficiency directly translate to more capable AI assistants, cheaper API costs, and new applications in research and cybersecurity.

Read Original Article

Gemini 3.1 Pro Crushes Benchmarks – 77% on ARC-AGI-2, Doubles Predecessor!

Why It Matters

Stay Ahead in AI