Research & Papers

Aitchison Embeddings for Learning Compositional Graph Representations

Graph embeddings that reveal why nodes are linked, not just how

Deep Dive

A team of researchers led by Nikolaos Nakis has introduced Aitchison Embeddings, a novel approach to graph representation learning that prioritizes interpretability without compromising predictive performance. Traditional graph embeddings like node2vec or GCNs produce high-dimensional vectors that are notoriously hard to explain, making them risky for regulated applications. The key insight is that many networks naturally admit a role-mixture view where nodes can be described as compositions over a small set of archetypal factors. By leveraging Aitchison geometry — the canonical mathematical framework for comparing compositional data — the team represents nodes as simplex-valued mixtures and maps them to Euclidean space via isometric log-ratio (ILR) coordinates. This transformation preserves Aitchison distances between nodes while allowing standard optimization techniques, and the resulting embeddings are intrinsically interpretable: each dimension corresponds to the relative prevalence of a specific archetype.

The method was evaluated on standard node classification and link prediction benchmarks, where it matched or exceeded strong baselines including GCNs and GraphSAGE. Crucially, Aitchison Embeddings also exhibit subcompositional coherence — meaning specific archetype groups can be removed and the remaining components renormalized without breaking the geometry. This enables a unique form of model debugging: practitioners can probe how different archetype groupings influence predictions by systematically dropping subsets. For example, in a social network, the impact of 'hobby-based' versus 'location-based' clusters on friendship predictions can be isolated. The framework supports both fixed and learnable ILR bases, offering flexibility. This work represents a step toward AI that explains its reasoning by construction, not post-hoc, which is vital for domains like healthcare, finance, and legal networks.

Key Points
  • Nodes are represented as simplex-valued mixtures over learnable archetypes using Aitchison geometry, enabling intrinsic interpretability
  • Isometric log-ratio (ILR) coordinates map compositions to Euclidean space while preserving Aitchison distances for unconstrained optimization
  • Subcompositional coherence allows principled removal of archetype groups to analyze their influence on predictions, with competitive accuracy vs. GCNs

Why It Matters

Makes graph AI interpretable by design, crucial for regulated industries needing transparency in node classification and link prediction