GPT-5.4-Pro achieves near parity with Gemini 3.1 Pro (84.6%) on ARC-AGI-2 with 83.3%
OpenAI's latest model scores 83.3%, just 1.3% behind Google's Gemini 3.1 Pro on the tough ARC-AGI-2 test.
OpenAI's GPT-5.4-Pro model has demonstrated a significant leap in reasoning performance, scoring 83.3% on the challenging ARC-AGI-2 benchmark. This places it just 1.3 percentage points behind Google's Gemini 3.1 Pro, which leads with 84.6%. The ARC-AGI-2 (Abstraction and Reasoning Corpus for AGI) is a rigorous test designed to evaluate an AI's ability to solve novel problems by identifying underlying patterns and rules, a key measure of advanced generalization. This result, shared via a viral Reddit post, indicates that OpenAI is rapidly closing what was once a perceived performance gap in core reasoning tasks against its primary competitor.
The near-parity score of 83.3% vs. 84.6% suggests the frontier of large language model capabilities is becoming increasingly contested at the highest level. For developers and enterprises, this means the choice between OpenAI's and Google's flagship models for complex reasoning applications is no longer clear-cut based on benchmark performance alone. Factors like cost, latency, ecosystem integration, and specific task performance may become the primary decision drivers. This development signals a shift towards a more balanced competitive landscape, likely accelerating innovation as both companies push to establish definitive leads in the next generation of models.
- GPT-5.4-Pro scores 83.3% on the ARC-AGI-2 benchmark, a key test for abstract reasoning.
- The score is within 1.3% of Gemini 3.1 Pro's 84.6%, indicating near performance parity.
- This closes a competitive gap, forcing model choice to hinge on cost, speed, and integration.
Why It Matters
Developers now have two near-equal top-tier models for complex reasoning, intensifying competition and likely driving down costs.