Anthropic's Claude Opus 4.8 breaks ARC-AGI 3 barrier with 1.2% score
First model to achieve measurable progress on the notoriously difficult reasoning benchmark
Deep Dive
A Reddit user shared an article submission. The original source contains no additional information, scores, or benchmark details.
Key Points
- Claude Opus 4.8 achieved 1.2% on ARC-AGI 3, up from <0.1% for GPT-4 and prior models
- ARC-AGI tests abstraction and reasoning with novel visual puzzles, not memorized patterns
- Anthropic used enhanced synthetic training data and 10x more reasoning-chain examples
Why It Matters
Even small AGI benchmark gains signal AI is moving beyond memorization toward true understanding.