DualView achieves 99.4% Top-4 Recall on MuSiQue with only 4.0 ms latency (249 QPS)?

DualView achieves 99.4% Top-4 Recall on MuSiQue with only 4.0 ms latency (249 QPS).

Outperforms 600M-parameter cross-encoders like BGE-Large (92.0%) by 5-6x lower latency?

Outperforms 600M-parameter cross-encoders like BGE-Large (92.0%) by 5-6x lower latency.

Adaptively fuses local cross-attention and global Transformer-based context aggregation via a query-conditioned gate?

Adaptively fuses local cross-attention and global Transformer-based context aggregation via a query-conditioned gate.

Research & Papers

DualView Reranking Hits 99.4% Recall at 4ms, Beats 600M Models

arXiv cs.IR May 20, 2026

⚡Multi-hop QA just got 5x faster with a lightweight dual-view reranker.

Deep Dive

Researchers from the arXiv team (Zhang, Li, Zhao) have unveiled DualView, a cascaded reranking framework designed specifically for multi-hop document retrieval. Multi-hop question answering demands aggregating information from multiple documents, a task where traditional rerankers struggle with both recall and speed. DualView tackles this by combining two views: a Local Scorer that uses stacked cross-attention to measure query-document relevance at a fine-grained level, and a Global Scorer that models inter-document dependencies via a Transformer-based context aggregator. These views are dynamically blended using an Adaptive Gate conditioned on query semantics, ensuring the system adapts to the complexity of each query.

Under a fixed candidate set with offline cached embeddings, DualView achieves stellar results on the MuSiQue dataset: 99.4% Top-4 Recall and 97.8% Full Hit accuracy while maintaining a latency of just 4.0 ms (249 queries per second). This substantially outperforms larger cross-encoders like BGE-Large (92.0% Recall) and Jina-v3 (90.1% Recall) while being 5 to 6 times faster. Ablation studies confirm that both the local and global views contribute meaningfully to the multi-hop performance. For professionals building retrieval-augmented generation (RAG) systems, DualView offers a practical, high-speed alternative without sacrificing accuracy.

Key Points

DualView achieves 99.4% Top-4 Recall on MuSiQue with only 4.0 ms latency (249 QPS).
Outperforms 600M-parameter cross-encoders like BGE-Large (92.0%) by 5-6x lower latency.
Adaptively fuses local cross-attention and global Transformer-based context aggregation via a query-conditioned gate.

Why It Matters

Enables near-perfect multi-hop document retrieval at production speeds, a game-changer for RAG pipelines.

Read Original Article

DualView Reranking Hits 99.4% Recall at 4ms, Beats 600M Models

Why It Matters

Related Articles

🚀 Stay Ahead in AI