Towards Improved Sentence Representations using Token Graphs
New method maintains 97% accuracy even when 90% of tokens are random noise, revolutionizing sentence embeddings.
A research team from the University of Cambridge and TU Wien has introduced GLOT (Graph-based Learning Over Tokens), a novel pooling method that dramatically improves how large language models create sentence embeddings. Published at ICLR 2026, GLOT addresses a fundamental weakness in standard approaches like mean pooling, which treat tokens as independent units and discard the rich relational information captured by transformer self-attention layers. By reframing pooling as relational learning, GLOT constructs a latent token-similarity graph from a frozen LLM's outputs, then refines representations using a lightweight graph neural network before aggregation.
The technical breakthrough is substantial: in stress tests where 90% of tokens were random distractors, GLOT maintained over 97% accuracy while baseline methods completely collapsed. The system achieves competitive performance on benchmarks like GLUE and MTEB while using 20x fewer trainable parameters than state-of-the-art methods and speeding up training by over 100x compared to parameter-efficient fine-tuning (PEFT) approaches. This efficiency comes from GLOT's ability to leverage the existing structure in pre-trained models without modifying their weights, supported by theoretical analysis showing its expressive power. The work establishes token graph learning as a powerful paradigm for efficiently adapting frozen LLMs to downstream tasks.
- GLOT maintains 97% accuracy even when 90% of input tokens are random noise, showing extreme robustness
- Uses 20x fewer trainable parameters than state-of-the-art methods while matching performance on GLUE/MTEB benchmarks
- Trains 100x faster than parameter-efficient fine-tuning methods by operating on frozen LLM outputs
Why It Matters
Enables dramatically better sentence embeddings from existing LLMs without expensive retraining, improving RAG and classification systems.