Research & Papers

Scaling Laws for Reranking in Information Retrieval

New study reveals predictable power laws for reranking models, allowing accurate forecasting of large-scale system performance.

Deep Dive

A team of researchers including Rahul Seetharaman, Aman Bansal, Hamed Zamani, and Kaustubh Dhole has published the first comprehensive study of scaling laws specifically for reranking systems in information retrieval. While scaling laws have been well-documented for tasks like natural language generation and dense retrieval, this work addresses a critical gap in understanding multi-stage retrieval systems where reranking serves as the final and most influential step before presenting results to users. The study systematically analyzes performance across model sizes and data budgets for three popular reranking paradigms: pointwise, pairwise, and listwise approaches, using cross-encoder rerankers as a detailed case study.

The research demonstrates that reranker performance follows predictable power law relationships, allowing accurate forecasting of larger model performance using smaller-scale experiments. For example, the team successfully estimated the NDCG (Normalized Discounted Cumulative Gain) of a 1B-parameter model by training and evaluating only smaller models up to 400M parameters, achieving reliable predictions in both in-domain and out-of-domain settings. The study reveals that downstream metrics like NDCG and MAP (Mean Average Precision) show consistent scaling behavior that can be accurately forecasted, while highlighting limitations of metrics like Contrastive Entropy and MRR (Mean Reciprocal Rank) which don't follow predictable patterns in all instances. These findings establish fundamental scaling principles for reranking and provide practical methodology for building industrial-grade retrieval systems while conserving computational resources.

Key Points
  • First systematic study of scaling laws for rerankers across pointwise, pairwise, and listwise paradigms
  • Demonstrates predictable power laws enabling accurate forecasting of 1B-parameter model performance using 400M-parameter experiments
  • Shows NDCG and MAP metrics scale reliably while Contrastive Entropy and MRR show inconsistent patterns

Why It Matters

Enables companies to forecast large-scale retrieval system performance without expensive training, saving computational resources.