New paper cuts LLM reranking token use by 37% with self-distillation
AI models 'overthink' rankings — researchers trim reasoning without losing accuracy.
Listwise reranking using large language models has become the gold standard in information retrieval, with reasoning-enhanced variants employing Chain-of-Thought (CoT) to deeply compare candidate documents. However, this performance comes at a steep computational cost — models often generate thousands of reasoning tokens before outputting a final ranking. In a new paper, Danyang Liu and Kan Li from the Information Retrieval community systematically study the relationship between reasoning length and ranking quality. They uncover a clear 'overthinking' phenomenon: beyond a certain point, extended reasoning provides negligible improvements while wasting compute resources. This finding challenges the assumption that more reasoning always leads to better outcomes.
To address this inefficiency, the authors propose a Length-Regularized Self-Distillation (LRSD) framework. They first synthesize a dataset by sampling diverse reasoning traces from a teacher model (Rank-K) and then apply a Pareto-inspired filter to select traces that achieve high ranking performance with minimal token usage. The student model is fine-tuned on these concise, high-quality rationales, learning to internalize efficient reasoning patterns and prune redundant deliberation. Experimental results on TREC Deep Learning and NeuCLIR benchmarks show that LRSD maintains the teacher's effectiveness while reducing inference token consumption by 34%–37% across different retrieval settings. This offers a practical, compute-efficient solution for deploying reasoning-enhanced rerankers in latency-sensitive applications like search engines and real-time question answering.
- LLMs in listwise reranking exhibit 'overthinking' where longer Chain-of-Thought reasoning yields diminishing returns.
- Length-Regularized Self-Distillation (LRSD) trains a student on Pareto-optimal reasoning traces, cutting token usage by 34%–37%.
- Evaluated on TREC Deep Learning and NeuCLIR benchmarks, LRSD retains teacher-level effectiveness while reducing compute cost.
Why It Matters
Enables faster, cheaper AI search reranking without sacrificing quality, critical for real-time applications.