Research & Papers

Is Attention sink without Positional Encoding unavoidable? [D]

r/MachineLearning April 30, 2026

⚡Removing positional encoding causes every query to attend to the same key tokens.

Deep Dive

A machine learning practitioner experimenting with small Transformers (encoder-decoder and cross-attention-only memory models) observed a persistent attention sink when Positional Encoding (PE) is removed. Whether using self-attention or cross-attention, attention heatmaps show vertical hot lines—every query vector attends to the same few key tokens. Adding RoPE or other PEs introduces diagonal patterns, but the user expected cross-attention to not need PE since queries and keys represent different data. Regularization to spread attention only widens the vertical stripes, failing to produce query-dependent attention.

This phenomenon, known as "attention sink," is well-documented in the LLM literature (e.g., the 2023 paper by Xiao et al.). Without PE, the model defaults to attending to a fixed set of tokens (often the first token) because that provides a consistent signal. The user's attempt to force dynamic attention without PE may require additional architectural changes—such as incorporating query-key bias terms, using gating mechanisms, or applying causal masking tricks. For practitioners, this underscores that PE is not just a convenience but a critical enabler for positional sensitivity in attention mechanisms.

Key Points

Removing Positional Encoding from self/cross-attention produces vertical lines in attention heatmaps, meaning all queries attend to the same keys.
Even with regularization, attention remains uniform—no diagonal (token-dependent) patterns emerge.
The user tried both encoder-decoder and cross-attention-only models, confirming the issue is architecture-agnostic.

Why It Matters

Reinforces that Positional Encoding is essential for dynamic attention, even in cross-attention settings.

Read Original Article

Is Attention sink without Positional Encoding unavoidable? [D]

Why It Matters

Stay Ahead in AI