The Cost of Context: Mitigating Textual Bias in Multimodal Retrieval-Augmented Generation
Accurate context can actually make AI models hallucinate more — here's why.
A new paper from researchers at Purdue University exposes a critical failure mode in Multimodal Large Language Models (MLLMs) enhanced with Retrieval-Augmented Generation (RAG). The authors, Hoin Jung and Xiaoqian Wang, formalize a phenomenon they call 'recorruption': even when perfectly accurate 'oracle' context is provided to a capable MLLM, the model can abandon an initially correct prediction. This contradicts the assumption that more context always helps. Through a mechanistic analysis of internal attention matrices, the researchers show that recorruption stems from a two-fold attentional collapse. First, 'visual blindness' systematically suppresses the model's attention mass and sharpness for visual inputs. Second, a structural positional bias forces the model to prioritize boundary tokens over semantically relevant content, leading to a 'textual copying bias' that overrides genuine multimodal reasoning.
To counter this, the authors propose BAIR (Bottleneck Attention Intervention for Recovery), a parameter-free inference-time framework that restores visual saliency and applies position-aware penalties to textual distractors. BAIR requires no retraining or fine-tuning and can be dropped into existing multimodal RAG pipelines. Experiments across medical factuality, social fairness, and geospatial benchmarks demonstrate that BAIR successfully recovers multimodal grounding and improves diagnostic reliability. This research has direct implications for developers deploying RAG systems in high-stakes domains, revealing that more context is not always better—and that the way attention is distributed matters as much as the data itself.
- Recorruption: Adding accurate context can cause MLLMs to abandon correct predictions due to attentional collapse.
- Two mechanisms: visual blindness (suppression of visual attention mass and sharpness) and positional bias (over-prioritizing boundary tokens).
- BAIR: a parameter-free inference-time fix that restores visual saliency and penalizes textual distractors, improving performance on medical, fairness, and geospatial benchmarks.
Why It Matters
For developers, RAG isn't a magic bullet—BAIR shows how to fix hidden biases that degrade multimodal AI reliability.