Research & Papers

On the Expressive Power of Contextual Relations in Transformers

arXiv stat.ML March 30, 2026

⚡A new theoretical framework proves transformers can model any continuous relationship between words.

Deep Dive

In a significant theoretical advance, researcher Demián Fraiman has published a paper titled 'On the Expressive Power of Contextual Relations in Transformers,' introducing a rigorous mathematical framework for understanding how AI models like GPT-4 process language. The core innovation is modeling texts not as sequences of tokens, but as probability distributions (measures) over a semantic space. Within this framework, the contextual relationship between words—how one word's meaning influences another—is formally defined as a 'coupling measure' between these distributions.

Fraiman then proposes a new architecture called the 'Sinkhorn Transformer,' designed to operate within this measure-theoretic setting. The paper's landmark result is a universal approximation theorem. It mathematically proves that a Sinkhorn Transformer with the right parameters can uniformly approximate any continuous 'coupling function' that encodes real-world semantic relationships. This moves beyond empirical observation to provide a formal guarantee about the model's capacity to learn complex contextual patterns.

This work addresses a major gap in AI theory: while transformers power everything from ChatGPT to Claude, a complete mathematical characterization of their ability to model context has been elusive. By grounding the problem in measure theory and optimal transport (hinted at by the 'Sinkhorn' name, referencing a related algorithm), the research provides tools to formally analyze and potentially improve future model architectures. It shifts the conversation from what works in practice to what is provably possible in theory.

Key Points

Introduces a measure-theoretic framework where texts are probability measures and word relations are coupling measures.
Proposes the 'Sinkhorn Transformer,' a new architecture designed for this formal mathematical setting.
Proves a universal approximation theorem, guaranteeing the model can learn any continuous semantic relationship function.

Why It Matters

Provides a rigorous mathematical foundation for understanding and improving state-of-the-art LLMs, guiding future architecture design.

Read Original Article

On the Expressive Power of Contextual Relations in Transformers

Why It Matters

Stay Ahead in AI