DualView Reranking Hits 99.4% Recall at 4ms, Beats 600M Models
Multi-hop QA just got 5x faster with a lightweight dual-view reranker.
Researchers from the arXiv team (Zhang, Li, Zhao) have unveiled DualView, a cascaded reranking framework designed specifically for multi-hop document retrieval. Multi-hop question answering demands aggregating information from multiple documents, a task where traditional rerankers struggle with both recall and speed. DualView tackles this by combining two views: a Local Scorer that uses stacked cross-attention to measure query-document relevance at a fine-grained level, and a Global Scorer that models inter-document dependencies via a Transformer-based context aggregator. These views are dynamically blended using an Adaptive Gate conditioned on query semantics, ensuring the system adapts to the complexity of each query.
Under a fixed candidate set with offline cached embeddings, DualView achieves stellar results on the MuSiQue dataset: 99.4% Top-4 Recall and 97.8% Full Hit accuracy while maintaining a latency of just 4.0 ms (249 queries per second). This substantially outperforms larger cross-encoders like BGE-Large (92.0% Recall) and Jina-v3 (90.1% Recall) while being 5 to 6 times faster. Ablation studies confirm that both the local and global views contribute meaningfully to the multi-hop performance. For professionals building retrieval-augmented generation (RAG) systems, DualView offers a practical, high-speed alternative without sacrificing accuracy.
- DualView achieves 99.4% Top-4 Recall on MuSiQue with only 4.0 ms latency (249 QPS).
- Outperforms 600M-parameter cross-encoders like BGE-Large (92.0%) by 5-6x lower latency.
- Adaptively fuses local cross-attention and global Transformer-based context aggregation via a query-conditioned gate.
Why It Matters
Enables near-perfect multi-hop document retrieval at production speeds, a game-changer for RAG pipelines.