H-Probes: Extracting Hierarchical Structures From Latent Representations of Language Models
Linear probes extract depth and distance from LLM latent spaces to expose hidden hierarchies
A team led by Cutter Dawes introduced H-probes, a collection of linear probes designed to extract hierarchical structures from the latent representations of large language models. Specifically, the probes measure depth and pairwise distance within the model's hidden states. On synthetic tree traversal tasks, the probes robustly identified subspaces that encode hierarchical information essential for completing the task. Ablation experiments confirmed these subspaces are low-dimensional, causally critical for high performance, and generalize both within and out of domain. This suggests LLMs do not merely represent syntax or surface-level concepts but internalize abstract hierarchical reasoning in geometrically meaningful ways.
Beyond synthetic benchmarks, the researchers found analogous hierarchical structures in real-world contexts such as mathematical reasoning traces, though the signals were weaker. This indicates that hierarchy is a fundamental primitive encoded across different levels of abstraction, including the reasoning process itself. The work provides a tool for interpreting how models organize knowledge, with implications for AI safety, debugging, and building more transparent systems. H-probes offer a scalable method for peering into the internal geometry of LLMs, revealing latent organization that was previously opaque.
- H-probes are linear probes that extract depth and pairwise distance from LLM latent representations, revealing hierarchical structure.
- The hierarchy-containing subspaces are low-dimensional and causally essential for task performance, as shown by ablation experiments.
- These hierarchical representations generalize to real-world mathematical reasoning traces, indicating deep abstraction in LLMs.
Why It Matters
Could enable more interpretable AI by revealing how models organize abstract reasoning hierarchies.