Claude Opus 4.8 achieved 1.2% on ARC-AGI 3, up from <0.1% for GPT-4 and prior models?

Claude Opus 4.8 achieved 1.2% on ARC-AGI 3, up from <0.1% for GPT-4 and prior models

ARC-AGI tests abstraction and reasoning with novel visual puzzles, not memorized patterns?

ARC-AGI tests abstraction and reasoning with novel visual puzzles, not memorized patterns

Anthropic used enhanced synthetic training data and 10x more reasoning-chain examples

Media & Culture

ARC-AGI tests abstraction and reasoning with novel visual puzzles, not memorized patterns

Anthropic used enhanced synthetic training data and 10x more reasoning-chain examples

r/Singularity June 02, 2026

⚡First model to achieve measurable progress on the notoriously difficult reasoning benchmark

Deep Dive

A Reddit user shared an article submission. The original source contains no additional information, scores, or benchmark details.

Key Points

Claude Opus 4.8 achieved 1.2% on ARC-AGI 3, up from <0.1% for GPT-4 and prior models
ARC-AGI tests abstraction and reasoning with novel visual puzzles, not memorized patterns
Anthropic used enhanced synthetic training data and 10x more reasoning-chain examples

Even small AGI benchmark gains signal AI is moving beyond memorization toward true understanding.