Research & Papers

Efficient Listwise Reranking with Compressed Document Representations

New listwise reranker compresses documents for 3-18x speed boost

Deep Dive

Researchers Hervé Déjean and Stéphane Clinchant have introduced RRK, a new listwise reranker that compresses documents into multi-token fixed-size embedding representations. This approach addresses the computational expense of reranking with Large Language Models (LLMs) by using compressed document representations inspired by recent advances in retrieval-augmented generation (RAG). The model is trained via distillation, combining rich compressed representations with listwise reranking to achieve high efficiency and effectiveness.

The 8B-parameter RRK model runs 3x-18x faster than smaller rerankers with 0.6 to 4 billion parameters, while matching or outperforming them in effectiveness. The efficiency gains are even more pronounced on long-document benchmarks, where RRK widens its advantage further. This development could significantly reduce the computational cost of reranking in information retrieval systems, making it more accessible for real-world applications.

Key Points
  • RRK compresses documents into multi-token fixed-size embeddings for efficient reranking
  • 8B-parameter model runs 3x-18x faster than smaller rerankers (0.6-4B parameters)
  • Efficiency gains are largest on long-document benchmarks

Why It Matters

RRK makes LLM-based reranking practical for long documents, cutting costs without sacrificing accuracy.