Research & Papers

PAR$^2$-RAG: Planned Active Retrieval and Reasoning for Multi-Hop Question Answering

arXiv cs.AI April 01, 2026

⚡New AI research separates evidence gathering from commitment, achieving major gains on complex reasoning tasks.

Deep Dive

A team led by Xingyu Li and Dan Roth from the University of Pennsylvania has introduced PAR²-RAG (Planned Active Retrieval and Reasoning RAG), a new framework designed to solve a critical weakness in current LLMs: multi-hop question answering (MHQA). MHQA requires models to retrieve and logically combine evidence from multiple documents, a task where existing systems often fail by locking onto an incorrect initial retrieval path or using rigid, non-adaptive plans. PAR²-RAG's key innovation is separating the process into two distinct phases—'coverage' and 'commitment'—to avoid these pitfalls.

In the first 'anchoring' stage, the system performs a breadth-first search to gather a wide, high-recall set of potential evidence, establishing a robust frontier of information. The second 'refinement' stage then iteratively performs depth-first reasoning on this frontier, with a built-in control mechanism that checks for evidence sufficiency before committing to an answer. This two-stage approach prevents the error amplification common in purely iterative methods and offers more flexibility than static planning systems.

The results are significant. Evaluated on four standard MHQA benchmarks, PAR²-RAG consistently outperformed leading baselines. Compared to the prominent IRCoT (Iterative Retrieval with Chain-of-Thought) method, it achieved accuracy gains of up to 23.5% and improved retrieval quality (measured by NDCG) by up to 10.5%. This represents a substantial leap in making RAG systems more reliable for complex, real-world queries that require connecting disparate pieces of information.

Key Points

Two-stage 'coverage then commitment' design prevents error amplification in multi-step reasoning.
Outperforms state-of-the-art IRCoT by up to 23.5% accuracy on four MHQA benchmarks.
Achieves up to 10.5% higher NDCG scores, indicating significantly better retrieval quality.

Why It Matters

Makes AI assistants more reliable for complex research, analysis, and support tasks requiring synthesis of multiple sources.

Read Original Article

PAR$^2$-RAG: Planned Active Retrieval and Reasoning for Multi-Hop Question Answering

Why It Matters

Stay Ahead in AI