Research & Papers

MIRA benchmark uses LLMs to evaluate cross-category search across 4 data types

A new benchmark tests search across publications, data, variables, and tools—powered by LLMs.

Deep Dive

MIRA (Multi-category Integrated Retrieval Assessment) is a new benchmark accepted at SIGIR 2026 that addresses the growing need for search systems capable of returning results from diverse data sources. Built upon a large-scale social science search platform, MIRA covers four distinct categories—Publications, Research Data, Variables, and Instruments & Tools—within a single evaluation framework. The benchmark uses real user queries rather than synthetic ones, making it more representative of actual search behavior.

A key innovation is the use of LLMs to generate topic descriptions, narratives, and relevance assessments, dramatically reducing the manual effort typically required for creating test collections. MIRA provides the research community with a foundational testbed for studying multi-faceted, category-aware, and integrated information retrieval. The dataset includes 8 pages of description and 2 figures, with a DOI linked to the ACM proceedings.

Key Points
  • Covers 4 heterogeneous categories: Publications, Research Data, Variables, Instruments & Tools
  • Built on real user queries from a social science search platform, not synthetic data
  • Uses LLMs to auto-generate topic descriptions and relevance assessments, reducing collection cost

Why It Matters

Enables realistic evaluation of unified search across scholarly data types, pushing IR systems toward practical multi-source retrieval.