Media & Culture

The Human Baseline for ARC-AGI-3 has been updated

The new human performance benchmark sets a 91% target for AI reasoning systems.

Deep Dive

The ARC Prize, a research initiative focused on developing artificial general intelligence, has announced a significant update to its ARC-AGI-3 benchmark. The key change is the establishment of a new human performance baseline, set at 91% accuracy. This benchmark consists of a collection of abstract reasoning tasks designed to test an AI system's ability to understand core concepts and apply them to novel situations, rather than simply recognizing patterns from training data. The update provides a more precise and challenging target for AI researchers worldwide.

Previously, the benchmark lacked a definitive human performance standard, making it difficult to gauge true progress. The new 91% baseline was established through rigorous testing with human participants, creating a clear milestone for measuring AI advancement. This move is part of a broader effort to shift AI evaluation from narrow benchmarks toward tests that require genuine reasoning, generalization, and understanding—key components of AGI. The ARC Prize continues to offer substantial monetary awards for systems that can approach or exceed this human baseline, incentivizing breakthroughs in fundamental AI reasoning capabilities.

Key Points
  • ARC-AGI-3 benchmark now has a formal human performance baseline of 91% accuracy
  • Benchmark tests abstract reasoning and core concept understanding, not pattern memorization
  • Provides concrete target for AGI research and measures progress beyond current AI capabilities

Why It Matters

Establishes a clear, measurable target for AI systems to achieve human-like abstract reasoning, guiding research toward true AGI.