The ARC-AGI leaderboard made me realize something terrifying (but weirdly comforting) about LLMs vs human brains
Models like Gemini 3.1 Pro score just 0.2% on visual puzzles, exposing a core limitation versus human intelligence.
A viral Reddit analysis of the ARC-AGI-3 benchmark has sparked a major conversation about the true nature of LLM intelligence. The post notes that top models like Google's Gemini 3.1 Pro and Anthropic's Claude Opus, which can pass bar exams and write complex code, score a dismal ~0.2% on the ARC-AGI visual puzzle test. This test, created by François Chollet, requires reasoning about novel 2D spatial relationships without relying on memorized text patterns. The failure exposes that LLMs are "brains in a jar"—powerful next-token predictors trained on human text, but devoid of sensory experience or grounded models of the physical world they describe.
The author argues this isn't a failure of the models, but a misalignment of expectations. Comparing an LLM to a human is like comparing an excavator's strength to a soccer player's agility; they are different specialized systems. Human intelligence evolved to navigate a chaotic 3D world on 20 watts of power, incorporating touch, sight, and emotion. LLMs, in contrast, are a form of externalized cultural memory—a "hard drive for our species." The key takeaway is that AGI may not be a super-human consciousness, but a profoundly useful, complementary intelligence. Professionals should view LLMs as powerful engines for specific tasks, not as pilots replacing human judgment and embodied understanding.
- Top LLMs like Gemini 3.1 Pro score only ~0.2% on the ARC-AGI-3 visual reasoning benchmark, revealing a stark performance gap versus textual tasks.
- The failure highlights LLMs' lack of embodied, sensory experience; they are "stochastic parrots" for text without grounded models of physical reality.
- The argument reframes AGI not as super-human intelligence, but as a specialized tool—an "external hard drive" complementing human strengths like adaptation and social reasoning.
Why It Matters
For professionals, it clarifies LLMs' role: unparalleled for information synthesis, but not a replacement for human judgment, creativity, or physical-world intuition.