PAR$^2$-RAG: Planned Active Retrieval and Reasoning for Multi-Hop Question Answering
New AI research separates evidence gathering from commitment, achieving major gains on complex reasoning tasks.
A team led by Xingyu Li and Dan Roth from the University of Pennsylvania has introduced PAR²-RAG (Planned Active Retrieval and Reasoning RAG), a new framework designed to solve a critical weakness in current LLMs: multi-hop question answering (MHQA). MHQA requires models to retrieve and logically combine evidence from multiple documents, a task where existing systems often fail by locking onto an incorrect initial retrieval path or using rigid, non-adaptive plans. PAR²-RAG's key innovation is separating the process into two distinct phases—'coverage' and 'commitment'—to avoid these pitfalls.
In the first 'anchoring' stage, the system performs a breadth-first search to gather a wide, high-recall set of potential evidence, establishing a robust frontier of information. The second 'refinement' stage then iteratively performs depth-first reasoning on this frontier, with a built-in control mechanism that checks for evidence sufficiency before committing to an answer. This two-stage approach prevents the error amplification common in purely iterative methods and offers more flexibility than static planning systems.
The results are significant. Evaluated on four standard MHQA benchmarks, PAR²-RAG consistently outperformed leading baselines. Compared to the prominent IRCoT (Iterative Retrieval with Chain-of-Thought) method, it achieved accuracy gains of up to 23.5% and improved retrieval quality (measured by NDCG) by up to 10.5%. This represents a substantial leap in making RAG systems more reliable for complex, real-world queries that require connecting disparate pieces of information.
- Two-stage 'coverage then commitment' design prevents error amplification in multi-step reasoning.
- Outperforms state-of-the-art IRCoT by up to 23.5% accuracy on four MHQA benchmarks.
- Achieves up to 10.5% higher NDCG scores, indicating significantly better retrieval quality.
Why It Matters
Makes AI assistants more reliable for complex research, analysis, and support tasks requiring synthesis of multiple sources.