From 0% to 36% on Day 1 of ARC-AGI-3
Achieves 36% score on Day 1 versus 0.3% for GPT-5.4 High, at 1/9th the cost of leading models.
Symbolica's Agentica SDK has made a stunning debut on the ARC-AGI-3 benchmark, achieving an unverified score of 36.08% on its first day. The system solved 113 out of 182 playable levels and completed 7 of the 25 available games, dramatically outperforming leading language models using Chain of Thought prompting. OpenAI's GPT-5.4 High scored just 0.3%, while Anthropic's Claude Opus 4.6 Max managed only 0.2% on the same evaluation set designed to test abstract reasoning and core AGI capabilities.
Beyond raw performance, the cost efficiency is staggering. Agentica's 36.08% score cost approximately $1,005 to achieve, while Claude Opus 4.6's 0.25% score required $8,900—making Symbolica's approach roughly 9x more cost-effective per percentage point of performance. The SDK uses an agentic architecture where AI agents can take persistent actions and reason through complex puzzles step-by-step, rather than relying on single-prompt Chain of Thought approaches that struggle with the benchmark's abstraction requirements.
The results suggest that specialized agent frameworks may outperform even frontier language models on specific reasoning tasks where persistent state and iterative problem-solving are required. Symbolica has open-sourced their implementation on GitHub, allowing developers to examine their approach to solving ARC-AGI-3's challenging puzzles that require understanding abstract patterns and applying them to novel situations—a key test for developing more general AI systems.
- Scored 36.08% on ARC-AGI-3 versus 0.3% for GPT-5.4 High and 0.2% for Claude Opus 4.6 Max
- Cost $1,005 for 36% score versus $8,900 for Opus 4.6's 0.25% score (9x more cost-effective)
- Solved 113/182 levels and 7/25 games using agentic architecture rather than Chain of Thought prompting
Why It Matters
Shows agentic AI can dramatically outperform expensive frontier models on complex reasoning tasks at a fraction of the cost.