BridgeRAG: Training-Free Bridge-Conditioned Retrieval for Multi-Hop Question Answering
New AI retrieval method uses a 'bridge-conditioned' LLM judge to solve complex questions in two steps.
Researcher Andre Bacellar has introduced BridgeRAG, a novel, training-free retrieval method designed to solve a core weakness in current RAG (retrieval-augmented generation) systems: multi-hop question answering. Traditional RAG often fails on questions requiring multiple steps of reasoning (e.g., 'What instrument did the composer of the soundtrack for *Inception* play?'), because it retrieves documents based solely on similarity to the original query. BridgeRAG reframes this as a bridge-conditioned problem. It first finds relevant 'bridge' documents, then uses a large language model as a judge to score 'candidate' documents based on their utility given that bridge, not just the query. This approach requires no offline training or construction of complex knowledge graphs.
The method's effectiveness is demonstrated across three standard benchmarks. BridgeRAG achieved a Recall@5 score of 0.8146 on MuSiQue, outperforming PropRAG by 3.1 percentage points and HippoRAG2 by 6.8 percentage points. It also set new training-free records on 2WikiMultiHopQA (0.9527) and HotpotQA (0.9875). Crucially, the gains are selective; performance improved by 2.55 percentage points on 'parallel-chain' queries where multiple reasoning paths exist, but showed minimal change on simpler 'single-chain' questions. This indicates the system is intelligently re-ranking candidates, not just creating noise, with an 18.7% 'flip-win' rate on complex queries.
BridgeRAG combines a broad candidate search via dual-entity ANN expansion with the precision of the LLM judge. The research shows the bridge signal is irreplaceable—substituting it with generated text hurt performance—and predictable, with the cosine similarity between the bridge and the ground-truth second-hop document correlating with performance gains. By separating coverage from scoring and fusing percentile-rank scores, BridgeRAG provides a powerful, plug-and-play upgrade for existing RAG pipelines tackling complex information needs.
- Achieves state-of-the-art training-free scores on three multi-hop QA benchmarks, including a 6.8pp gain over HippoRAG2 on MuSiQue.
- Uses an LLM as a 'bridge-conditioned' judge to score candidate documents, eliminating the need for pre-built graph databases or model training.
- Shows selective improvement on complex 'parallel-chain' queries (+2.55pp) by productively re-ranking candidates, not just increasing retrieval churn.
Why It Matters
Enables RAG systems to answer complex, multi-step questions with higher accuracy, without the infrastructure cost of graph databases or model fine-tuning.