Research & Papers

ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models

Multi-anchor embeddings go mainstream with 98.7% fewer parameters and higher accuracy.

Deep Dive

Researchers Orhan Demirci and Sezer Aptourachman have introduced Adaptive Dictionary Embeddings (ADE), a framework that successfully scales multi-anchor word representations to large language models. Traditional word embeddings represent each word with a single vector, creating bottlenecks for polysemous words. Multi-anchor representations, which use multiple vectors per word, have shown promise but were previously limited to small-scale models due to computational inefficiency. ADE overcomes this with three key innovations: Vocabulary Projection (VP) that transforms costly two-stage anchor lookup into a single efficient matrix operation, Grouped Positional Encoding (GPE) that allows anchors of the same word to share positional information while varying semantically, and context-aware anchor reweighting that uses self-attention to dynamically compose anchor contributions based on sequence context.

Integrated into a Segment-Aware Transformer (SAT), ADE was evaluated on AG News and DBpedia-14 text classification benchmarks. With 98.7% fewer trainable parameters than DeBERTa-v3-base, ADE surpasses DeBERTa on DBpedia-14 (98.06% vs. 97.80%) and approaches it on AG News (90.64% vs. 94.50%), while compressing the embedding layer over 40x. This demonstrates that multi-anchor representations are a practical and parameter-efficient alternative to single-vector embeddings in modern transformer architectures, potentially enabling more efficient and semantically richer language models.

Key Points
  • ADE compresses the embedding layer over 40x while outperforming DeBERTa-v3-base on DBpedia-14 (98.06% vs 97.80%)
  • Vocabulary Projection replaces costly two-stage anchor lookup with a single efficient matrix operation
  • Grouped Positional Encoding preserves semantic coherence while enabling anchor-level variation within the same word

Why It Matters

Enables more efficient LLMs with richer semantic representations, reducing memory footprint without sacrificing accuracy.