Research & Papers

Overview of the TREC 2025 Retrieval Augmented Generation (RAG) Track

The benchmark introduces long, narrative queries to push AI systems toward better reasoning and factual grounding.

Deep Dive

The TREC 2025 RAG Track, led by researchers including Shivani Upadhyay and Jimmy Lin, has released its official overview, marking the second year of this influential AI benchmark. Building on the 2024 track, this year's challenge introduces a significant evolution: long, multi-sentence narrative queries designed to mimic complex, real-world information needs that require reasoning. Over 150 teams submitted systems tasked with designing integrated retrieval and generation pipelines, pushing the state of the art in creating AI that can handle deep search tasks beyond simple fact lookup.

Participants' systems were tested against the massive MS MARCO V2.1 document corpus. The evaluation framework is notably rigorous, moving beyond simple answer correctness. It employs a multi-layered assessment that judges the relevance of retrieved documents, the completeness of the generated response, and the critical verification of attribution—ensuring answers are properly grounded in source material. This focus on transparency and factual grounding aims to foster innovation in building RAG systems that are not just capable, but also trustworthy and context-aware for professional and research applications.

Key Points
  • Introduces complex, narrative queries to test reasoning in RAG systems, moving beyond simple Q&A.
  • Evaluated over 150 submissions using a multi-layered framework on the MS MARCO V2.1 corpus.
  • Emphasizes attribution verification and answer completeness to build more trustworthy, factual AI assistants.

Why It Matters

This benchmark drives the development of AI that can reliably answer complex, real-world questions for professionals in law, research, and analysis.