Research & Papers

The Rise and Fall of $G$ in AGI

A new paper finds the 'G-factor' unifying AI models dropped from 92% to 64% as specialized reasoning models emerged.

Deep Dive

A provocative new paper by David C. Krakauer applies classic psychometric analysis—specifically Spearman's g-factor concept—to the evolution of artificial intelligence. By treating 39 major AI model releases from 2019-2025 as 'subjects' and 14 common benchmarks as 'cognitive tests,' the research creates a models × benchmarks × time matrix. Principal component analysis reveals a strong 'positive manifold' where all benchmark performances were positively correlated, indicating a unified 'G-factor' in AI similar to human general intelligence. This factor peaked dramatically between 2023-2024, explaining up to 92% of performance variance across core benchmarks.

However, the study documents a significant shift beginning in 2024. The unifying G-factor dropped sharply to 64% variance explained as reasoning-specialized models entered the landscape. Krakauer connects this decline to models increasingly 'outsourcing' reasoning to external tools and developing specialized capabilities. The paper argues this represents a transition from 'AI-hedgehogs' (unified general systems) to 'AI-foxes' (diverse specialized systems). This specialization beneath the surface of general benchmarks suggests the field is moving toward more modular, capability-specific architectures rather than pursuing monolithic AGI through scaling alone.

The analysis also introduces the concept of a 'Ptolemaic Succession' in AI development—where instead of finding simpler, more elegant mechanisms, architectures become increasingly complex and hierarchical to achieve new capabilities. This has profound implications for how we measure progress toward AGI, suggesting that benchmark batteries need to evolve beyond testing unified intelligence to properly assess specialized cognitive subsystems. The temporal analysis of partial correlation matrices provides unique evidence for this evolutionary specialization happening beneath the surface of improving benchmark scores.

Key Points
  • The study analyzed 39 AI models (2019-2025) across 14 benchmarks using psychometric methods, finding a unified 'G-factor' peaked at 92% variance explained in 2023-2024.
  • This general intelligence factor dropped to 64% by 2024 with the arrival of reasoning-specialized models, indicating diversification into 'AI-foxes' rather than unified 'AI-hedgehogs.'
  • The research suggests AI development follows a 'Ptolemaic Succession' of increasing architectural complexity rather than discovering parsimonious general mechanisms.

Why It Matters

This changes how we measure AGI progress and suggests future AI systems will be modular specialists rather than monolithic generalists.