Research & Papers

Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets

Turning generic summaries into query-focused data—without human effort

Deep Dive

Researchers Yllias Chali and Deen Abdullah have introduced a novel approach to tackle the scarcity of large-scale datasets for Query-Focused Summarization (QFS). Their paper, published on arXiv (2605.05392), presents an evidence-based model that automatically generates query keywords from existing query-free summarization datasets—those containing only documents and summaries without explicit queries. The method works by extracting evidence-rich terms from the source documents and corresponding summaries, then assembling them into queries that capture the information needs implicitly present in the summary.

To validate their approach, the team conducted both intrinsic and extrinsic evaluations. Intrinsically, they measured the similarity between system-generated queries and original human-written queries from two standard QFS datasets. Extrinsically, they used the generated queries to train several pre-trained models, including a state-of-the-art QFS model, and compared output summaries. Results showed that summaries generated using evidence-based queries achieved competitive ROUGE scores—a standard metric for summarization quality—against those produced with original queries. The work, spanning 7 pages, effectively demonstrates that query-free datasets can be repurposed for QFS without manual annotation, potentially unlocking vast quantities of training data.

Key Points
  • Proposes an evidence-based model that automatically generates query keywords from query-free summarization datasets
  • Tested on two QFS datasets; summaries with generated queries achieved competitive ROUGE scores vs. original queries
  • Addressed the lack of large-scale QFS training data by repurposing existing query-free datasets

Why It Matters

Enables AI teams to repurpose millions of query-free summaries for query-focused tasks without costly human annotation.