Agent Frameworks

GRASP cuts token usage 40-50% while boosting multi-hop QA accuracy

New agentic retrieval system uses a three-layer graph to save tokens and improve answers.

Deep Dive

A team of researchers from the University of Wisconsin-Madison (Stockton Jenkins, Ramya Korlakai Vinayak, Junjie Hu) has released a paper on GRASP (Graph Agentic Search over Propositions), a novel agentic retrieval system designed to tackle multi-hop question answering (QA) with dramatically lower token usage. Traditional agentic retrieval methods either execute rigid single queries or rely on expensive knowledge graphs that increase both indexing cost and inference-time token consumption. GRASP instead coordinates its retrieval by decomposing multi-hop queries into dependency-aware plans, dynamically scaling the number of sub-agents based on problem complexity. Each sub-agent explores a three-layer hierarchical graph of entities, propositions, and passages using the entity layer for targeted traversal and the proposition layer for high-recall passage retrieval via reciprocal-rank voting.

GRASP was evaluated on three datasets—MuSiQue, 2WikiMultihopQA, and HotpotQA—under two settings: open-corpus retrieval and extended context reasoning (LongBench). In open retrieval, GRASP achieved the highest QA accuracy on MuSiQue and 2WikiMultihopQA while using 40-50% fewer tokens than IRCoT+HippoRAG2. In the LongBench setting, GRASP led on Exact Match and F1 scores across all three datasets while consuming 30% fewer tokens than the next most accurate method. The paper also introduces 'success economy'—an amortized token cost per correct answer weighted by difficulty—and advocates for efficiency-aware evaluation as a standard practice in agentic QA.

Key Points
  • GRASP uses a three-layer hierarchical graph (entities, propositions, passages) to enable efficient multi-hop retrieval.
  • Achieves top accuracy on MuSiQue and 2WikiMultihopQA while using 40-50% fewer tokens than IRCoT+HippoRAG2 in open retrieval.
  • Introduces 'success economy' metric to measure token efficiency weighted by answer difficulty.

Why It Matters

GRASP could drastically reduce API costs and latency for complex QA systems without sacrificing accuracy.