Research & Papers

Semantic Sections: An Atlas-Native Feature Ontology for Obstructed Representation Spaces

New research challenges how we define AI 'features,' showing current methods miss locally coherent but globally inconsistent meanings.

Deep Dive

A new research paper titled 'Semantic Sections: An Atlas-Native Feature Ontology for Obstructed Representation Spaces' proposes a fundamental shift in how we interpret features within large language models (LLMs). Authored by Hossein Javidnia, the work argues that the prevailing method of defining a 'feature'—as a single global direction or vector shared across all contexts—is inadequate for complex, 'obstructed' representation spaces. In these spaces, locally coherent concepts or meanings do not necessarily assemble into one globally consistent direction that a model uses everywhere. The paper introduces a replacement object called a 'semantic section,' defined as a transport-compatible family of local feature representatives over a 'context atlas.' This framework allows researchers to track how a feature's representation changes as the model's context shifts.

Javidnia formalizes the concept, proves that 'tree-supported propagation' is always achievable, and identifies 'cycle consistency' as the critical criterion for determining if a locally observed feature can be globalized. This leads to a taxonomy distinguishing 'tree-local,' 'globalizable,' and 'twisted' sections, with 'twisted' sections capturing meanings that are locally coherent but obstructed from forming a single global vector due to 'holonomy'—a geometric property where parallel transport around a loop doesn't return to the starting point. The researcher then develops a practical pipeline for discovering and certifying these sections using seeded propagation, synchronization, and cycle-aware taxonomy.

Applied to layer-16 atlases of popular models like Meta's Llama 3.2 3B Instruct, Alibaba's Qwen 2.5 3B Instruct, and Google's Gemma 2 2B IT, the method revealed nontrivial populations of all three section types. Crucially, the experiments demonstrated that 'semantic identity'—recognizing when two local activations belong to the same underlying feature—is poorly recovered by standard metrics like raw cosine similarity between global vectors. Even certified 'globalizable' sections showed low cross-context similarity, and baseline similarity methods recovered only a small fraction of true matches. In contrast, the section-based method achieved perfect identity recovery on certified supports, strongly supporting the new ontology as more accurate for interpreting modern AI models.

Key Points
  • Introduces 'semantic sections' as a new feature ontology, arguing global vector definitions fail in obstructed spaces.
  • Formalizes sections, proves tree propagation is realizable, and uses cycle consistency to classify features as tree-local, globalizable, or twisted.
  • Tests on Llama 3.2, Qwen 2.5, and Gemma 2 show the method perfectly recovers semantic identity where raw vector similarity collapses.

Why It Matters

Provides a more accurate framework for interpreting AI models, which is crucial for improving safety, reliability, and understanding model failures.