Research & Papers

Attention calibration eliminates positional bias in dense retrieval

No retraining needed: simple tweak makes models fairer across passages.

Deep Dive

Dense retrieval models often suffer from positional bias—relevant information appearing later in a passage gets less attention, degrading retrieval effectiveness. Andrianos Michail and colleagues adapt inference-time attention calibration (previously introduced by Schuhmacher et al., 2026) to downstream retrieval, adding a strength coefficient lambda that interpolates between original and fully calibrated attention distributions. This allows fine-grained control over fairness without retraining.

Testing on SQuAD-PosQ and FineWeb-PosQ across three embedding models, the team found partial calibration (B=128, lambda=0.5, 50% layer depth) outperforms full calibration, improving harmonic mean of nDCG@10 across positional groups for all models without per-model tuning. The method transfers to PosIR, which spans 10 languages and 31 domains, reducing the Position Sensitivity Index in all 16 length-quartile × model × retrieval-setting combinations while preserving or improving aggregate nDCG@10. The code is released on GitHub, offering a practical, drop-in fix for fairer retrieval.

Key Points
  • Inference-time attention calibration with lambda interpolation (no retraining needed) reduces positional bias in dense retrieval.
  • Optimal configuration: B=128, lambda=0.5, 50% layer depth—improves nDCG@10 across all positional groups on FineWeb-PosQ for three models.
  • Transfers to 10-language, 31-domain PosIR benchmark, reducing Position Sensitivity Index in all 16 tested combinations while maintaining retrieval quality.

Why It Matters

Fairer search results without costly retraining, improving retrieval quality across languages and domains.