SumRank: Aligning Summarization Models for Long-Document Listwise Reranking
Researchers' new model cuts search overhead by summarizing documents before ranking them, achieving top benchmarks.
A team of researchers has introduced SumRank, a novel AI model designed to solve a critical bottleneck in modern search: efficiently and effectively ranking long documents. While Large Language Models (LLMs) excel at reranking short passages, their performance and speed degrade with lengthy texts due to massive context windows. SumRank tackles this by acting as a specialized pre-processing step, using a pointwise summarization model to compress long-form documents into concise, rank-aligned summaries before the final listwise reranking stage. This compression preserves the key relevance signals needed for accurate ranking.
The model is trained through a sophisticated three-stage pipeline. It begins with supervised fine-tuning (SFT) for a cold start, followed by the construction of specialized reinforcement learning (RL) data. The final and crucial stage is rank-driven alignment via RL, which directly optimizes the summarizer for the downstream ranking objective. In extensive experiments across five benchmark datasets from the TREC Deep Learning tracks (2019-2023), the lightweight SumRank model achieved state-of-the-art (SOTA) ranking performance. More importantly, it delivered significant efficiency gains by drastically reducing the computational overhead for both the summarization and the subsequent reranking processes compared to applying LLMs directly to full documents.
- Achieves state-of-the-art ranking performance on five TREC Deep Learning track benchmarks (DL 19-23).
- Uses a three-stage RL alignment pipeline to train a summarizer that preserves ranking signals.
- Significantly improves efficiency by reducing computational overhead for both summarization and final reranking stages.
Why It Matters
Enables faster, more accurate search over lengthy reports, legal documents, and research papers, cutting costs and latency.