Google Releases Gemini 3.1 Pro, Beats Claude Opus 4.6, GPT 5.2
Gemini 3.1 Pro achieves 77.1% on ARC-AGI-2, more than doubling previous model's reasoning.
Google DeepMind has released Gemini 3.1 Pro, the latest flagship model that reclaims the top spot across most industry-standard benchmarks, outperforming Anthropic's Claude Opus 4.6 and OpenAI's GPT-5.2. The model scored 94.3% on GPQA Diamond (expert science), 77.1% on ARC-AGI-2 abstract reasoning (more than 2x Gemini 3 Pro), and 80.6% on SWE-Bench Verified for agentic coding. On APEX-Agents, which measures long-horizon professional tasks, it posted 33.5%—nearly double its predecessor's 18.4% and well ahead of Opus 4.6's 29.8% and GPT-5.2's 23.0%. However, rivals held ground in specific areas: Claude Sonnet 4.6 tied on MRCR v2 long-context (84.9%) and led on GDPval-AA Elo (1633 vs 1317), while OpenAI's GPT-5.3-Codex topped terminal coding benchmarks (77.3% on Terminal-Bench 2.0) and SWE-Bench Pro (56.8%).
Starting today, Gemini 3.1 Pro rolls out across Google's ecosystem: developers can access it via the Gemini API, Google AI Studio, Gemini CLI, and Android Studio, while enterprise customers get it through Vertex AI. The release signals that Google has regained AI model leadership, at least temporarily, after falling behind Anthropic and OpenAI in late 2024. The strong performance on agentic benchmarks—including MCP Atlas (69.2%), BrowseComp (85.9%), and t2-bench Telecom (99.3%)—is especially significant as the industry shifts toward AI agents capable of complex, multi-step workflows. Google CEO Sundar Pichai highlighted the model's improved reasoning as a step forward for super complex tasks like visualizing concepts and synthesizing data. Given the rapid pace of releases, OpenAI and Anthropic are expected to counter soon, but for now, Gemini 3.1 Pro is the most powerful publicly available AI model.
- Gemini 3.1 Pro leads 13 of 16 benchmarks, scoring 94.3% on GPQA Diamond and 77.1% on ARC-AGI-2 abstract reasoning (2x Gemini 3 Pro).
- Excels in agentic tasks: 80.6% on SWE-Bench Verified, 33.5% on APEX-Agents (nearly double predecessor, ahead of GPT-5.2 and Opus 4.6).
- Rolls out today via Gemini API, Google AI Studio, Vertex AI, and consumer apps; rivals Claude Opus 4.6 and GPT-5.2 in multiple categories.
Why It Matters
Google reclaims AI leadership with superior reasoning and agentic capabilities, intensifying the race with Anthropic and OpenAI.