Structure Guided Retrieval-Augmented Generation for Factual Queries
New structure-guided RAG boosts factual query accuracy by up to 50.88 points
A team of researchers (Miao Xie, Xiao Zhang, Yi Li, Chunli Lv) has introduced a new paradigm in retrieval-augmented generation called Structure Guided Retrieval-Augmented Generation (SG-RAG). The paper, published on arXiv, tackles a fundamental weakness of existing RAG systems: their reliance on vector similarity for retrieval. This approach often introduces semantic noise and fails to satisfy all conditions in complex factual queries, leading to incorrect answers. The authors formally define this challenge as the Exact Retrieval Problem (ERP), which explicitly incorporates structural information into the retrieval process for the first time.
SG-RAG reframes retrieval as an embedding-based subgraph matching task, using retrieved topological structures to guide LLMs toward answers that meet every specified query condition. To benchmark this, the team built and released ERQA (Exact Retrieval Question Answering)—a large-scale dataset with 120,000 fact-oriented QA pairs spanning 20 diverse domains, each involving complex multi-condition queries. Experimental results show SG-RAG delivers absolute improvements of 20.68 to 50.88 points across all evaluation metrics compared to strong baselines, while maintaining reasonable computational overhead. This work is a significant step toward making RAG systems truly reliable for factual information retrieval.
- SG-RAG models retrieval as embedding-based subgraph matching rather than simple vector similarity
- New ERQA dataset contains 120,000 complex QA pairs across 20 domains for benchmarking
- Absolute accuracy improvements of 20.68–50.88 points over strong baselines
Why It Matters
This approach could make AI assistants far more reliable for complex factual queries in enterprise search and knowledge management.