ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models
Multi-anchor embeddings go mainstream with 98.7% fewer parameters and higher accuracy.
Researchers Orhan Demirci and Sezer Aptourachman have introduced Adaptive Dictionary Embeddings (ADE), a framework that successfully scales multi-anchor word representations to large language models. Traditional word embeddings represent each word with a single vector, creating bottlenecks for polysemous words. Multi-anchor representations, which use multiple vectors per word, have shown promise but were previously limited to small-scale models due to computational inefficiency. ADE overcomes this with three key innovations: Vocabulary Projection (VP) that transforms costly two-stage anchor lookup into a single efficient matrix operation, Grouped Positional Encoding (GPE) that allows anchors of the same word to share positional information while varying semantically, and context-aware anchor reweighting that uses self-attention to dynamically compose anchor contributions based on sequence context.
Integrated into a Segment-Aware Transformer (SAT), ADE was evaluated on AG News and DBpedia-14 text classification benchmarks. With 98.7% fewer trainable parameters than DeBERTa-v3-base, ADE surpasses DeBERTa on DBpedia-14 (98.06% vs. 97.80%) and approaches it on AG News (90.64% vs. 94.50%), while compressing the embedding layer over 40x. This demonstrates that multi-anchor representations are a practical and parameter-efficient alternative to single-vector embeddings in modern transformer architectures, potentially enabling more efficient and semantically richer language models.
- ADE compresses the embedding layer over 40x while outperforming DeBERTa-v3-base on DBpedia-14 (98.06% vs 97.80%)
- Vocabulary Projection replaces costly two-stage anchor lookup with a single efficient matrix operation
- Grouped Positional Encoding preserves semantic coherence while enabling anchor-level variation within the same word
Why It Matters
Enables more efficient LLMs with richer semantic representations, reducing memory footprint without sacrificing accuracy.