Convergence driven by universal pressure to compress?

an apple's 10^25 degrees of freedom reduced to a few variables like position.

Shared physical environment (e.g., translation invariance) forces similar abstractions across species and AI systems?

Shared physical environment (e.g., translation invariance) forces similar abstractions across species and AI systems.

Unlike 'natural abstractions', this hypothesis acknowledges fragility?

changes in architecture or training can break convergence.

AI Safety

Convergent Abstraction Hypothesis: AI learns like convergent evolution

LessWrong AI May 15, 2026

⚡Sharks and dolphins evolved similar shapes; AI models may converge on identical concepts.

Deep Dive

The Convergent Abstraction Hypothesis, introduced by Jan_Kulveit on LessWrong, suggests that cognitive systems—whether biological brains or AI models—independently evolve similar abstractions when trained in comparable environments and under analogous selection pressures. Drawing from convergent evolution (e.g., sharks and dolphins sharing streamlined bodies due to hydrodynamics), Kulveit argues that abstractions like 'objectness' or arithmetic emerge repeatedly because they offer extreme compression: an apple can be represented from ~10^25 degrees of freedom down to a handful of variables like position and momentum. This compression is a universal pressure—from physics to SGD-trained neural networks—and many mathematical structures (PCA, Fourier transforms) serve the same efficient-coding role.

However, Kulveit stresses that this convergence is contingent, not inevitable. Just as squids evolved differently from sharks due to lacking a vertebral column, changes in architecture, training data distribution, or optimization regime could disrupt convergence. The hypothesis offers a middle ground: empirical evidence of convergence (e.g., vision models learning edge detectors or word embeddings clustering similar meanings) is real and useful for alignment, but practitioners should not assume these abstractions are universally natural or stable. This matters for safety, as robust alignment may need to account for fragility when models encounter distribution shifts or different objectives.

Key Points

Convergence driven by universal pressure to compress: an apple's 10^25 degrees of freedom reduced to a few variables like position.
Shared physical environment (e.g., translation invariance) forces similar abstractions across species and AI systems.
Unlike 'natural abstractions', this hypothesis acknowledges fragility: changes in architecture or training can break convergence.

Why It Matters

For AI alignment, it warns that learned abstractions may be contingent, not universal, requiring careful validation across different models.

Read Original Article

Convergent Abstraction Hypothesis: AI learns like convergent evolution

Why It Matters

Related Articles

🚀 Stay Ahead in AI