LLM-based Listwise Reranking under the Effect of Positional Bias
New technique tackles a hidden flaw in AI search reranking, improving NDCG@10 performance.
A team of researchers from institutions including Baidu and the University of Amsterdam has identified and proposed a fix for a critical flaw in how large language models (LLMs) rank search results. Their paper, "LLM-based Listwise Reranking under the Effect of Positional Bias," reveals that LLMs used for reranking suffer from a strong positional bias, meaning passages appearing later in the input list are statistically less likely to be moved to the top of the final ranking. This undermines the goal of finding the most relevant information, regardless of its initial position.
To solve this, the team developed 'DebiasFirst,' a two-pronged fine-tuning method. First, it uses positional calibration, applying inverse propensity scoring to re-weight the loss function and correct for the model's architectural bias. Second, it employs position-aware data augmentation, ensuring training data shows each relevant passage equally across all list positions. This approach significantly reduces the model's dependence on the original ranking order, enhancing the robustness and effectiveness (measured by metrics like NDCG@10) of the reranker across various first-stage retrieval systems.
- Identifies 'positional bias' where LLM rerankers unfairly deprioritize passages at the end of an input list.
- Proposes 'DebiasFirst,' a fine-tuning method using calibration and data augmentation to mitigate the bias.
- Improves ranking accuracy (NDCG@10) and robustness, making AI-powered search reranking more reliable.
Why It Matters
This makes AI search agents and RAG systems more accurate by ensuring the best answer isn't missed due to its initial position.