LTC applies adaptive token pooling at intermediate transformer layers, not just the input layer?

LTC applies adaptive token pooling at intermediate transformer layers, not just the input layer.

On MS MARCO, LTC boosts QPS by 25% for passage ranking and 116% for document ranking?

On MS MARCO, LTC boosts QPS by 25% for passage ranking and 116% for document ranking.

Research & Papers

Layer-wise Token Compression Boosts Reranker Speed by 116%

arXiv cs.IR May 21, 2026

⚡New token compression method speeds up document reranking without losing quality.

Deep Dive

Researchers propose Layer-wise Token Compression (LTC) for cross-encoder rerankers. By adaptively pooling tokens at intermediate transformer layers, LTC achieves up to 25% QPS gain on MS MARCO passage ranking and 116% on document ranking. The method also extends to listwise LLM rerankers, and surprisingly, compressed models outperform uncompressed ones on long-document tasks, acting as a regularizer.

Key Points

LTC applies adaptive token pooling at intermediate transformer layers, not just the input layer.
On MS MARCO, LTC boosts QPS by 25% for passage ranking and 116% for document ranking.
Compressed models outperform uncompressed ones on long-document tasks, acting as a regularizer.

Why It Matters

Faster reranking without quality loss means cheaper and more scalable search in production systems.

Read Original Article

Layer-wise Token Compression Boosts Reranker Speed by 116%

Why It Matters

Related Articles

🚀 Stay Ahead in AI