Media & Culture

Anthropic's Claude Opus 4.8 breaks ARC-AGI 3 barrier with 1.2% score

First model to achieve measurable progress on the notoriously difficult reasoning benchmark

Deep Dive

A Reddit user shared an article submission. The original source contains no additional information, scores, or benchmark details.

Key Points
  • Claude Opus 4.8 achieved 1.2% on ARC-AGI 3, up from <0.1% for GPT-4 and prior models
  • ARC-AGI tests abstraction and reasoning with novel visual puzzles, not memorized patterns
  • Anthropic used enhanced synthetic training data and 10x more reasoning-chain examples

Why It Matters

Even small AGI benchmark gains signal AI is moving beyond memorization toward true understanding.