AI Safety

The ML ontology and the alignment ontology

A viral essay argues the AI safety community wasted years trying to translate its core concepts.

Deep Dive

AI researcher Richard Ngo's essay 'The ML Ontology and the Alignment Ontology' has sparked significant discussion within the AI community. The piece analyzes the historical and ongoing conceptual divide between AI alignment researchers—focused on long-term safety and control—and mainstream machine learning (ML) practitioners. Ngo argues the two groups operated with fundamentally different 'ontologies,' or frameworks for understanding AI systems. For alignment researchers, core concepts included 'inner vs. outer alignment,' 'mesa-optimizers,' 'corrigibility,' and 'situational awareness.' In contrast, the classic ML ontology centered on optimizing reward functions and statistical generalization.

Ngo details how this divide made communication nearly impossible, with alignment concepts often throwing a 'type error' for ML researchers. He cites the example of 'situational awareness'—an AI's understanding of its own training process and human oversight—which was nonsensical in traditional ML frameworks. The essay notes that compelling empirical evidence from large language models (LLMs) like GPT-3 and GPT-4 eventually forced the ML community to partially adopt alignment concepts, but in an ad-hoc, 'shoehorned' manner. Ngo is critical of efforts like 'goal misgeneralization' research, which he views as a confused translation of the deeper alignment problem of 'inner misalignment.'

The core argument is that the alignment community, including Ngo's past self and funders like OpenPhil, made a strategic error by prioritizing making their ideas 'legible' to the ML ontology. This process was laborious, often created confusion, and diverted resources from developing the alignment ontology on its own terms. Ngo's conclusion is a stark recommendation: alignment researchers should 'pay less attention to the ML ontology' and focus on developing their own conceptual frameworks, trusting that empirical reality (like the behavior of advanced LLMs) will eventually validate their concerns.

Key Points
  • The essay identifies a fundamental conceptual divide between AI safety ('alignment') and mainstream machine learning research frameworks.
  • Ngo argues efforts to translate alignment concepts like 'situational awareness' into ML terms were largely a strategic mistake that slowed progress.
  • The piece suggests the success of LLMs has forced ML to awkwardly adopt alignment ideas, validating the safety community's original concerns.

Why It Matters

Highlights a critical communication failure in AI development that may have delayed vital safety research by years.