Research & Papers

Layer-wise Token Compression Boosts Reranker Speed by 116%

New token compression method speeds up document reranking without losing quality.

Deep Dive

Researchers propose Layer-wise Token Compression (LTC) for cross-encoder rerankers. By adaptively pooling tokens at intermediate transformer layers, LTC achieves up to 25% QPS gain on MS MARCO passage ranking and 116% on document ranking. The method also extends to listwise LLM rerankers, and surprisingly, compressed models outperform uncompressed ones on long-document tasks, acting as a regularizer.

Key Points
  • LTC applies adaptive token pooling at intermediate transformer layers, not just the input layer.
  • On MS MARCO, LTC boosts QPS by 25% for passage ranking and 116% for document ranking.
  • Compressed models outperform uncompressed ones on long-document tasks, acting as a regularizer.

Why It Matters

Faster reranking without quality loss means cheaper and more scalable search in production systems.