ReFeed: Retrieval Feedback-Guided Dataset Construction for Style-Aware Query Rewriting
New method automatically fixes failed searches by rewriting queries to match document style.
A research team from KAIST and Seoul National University has introduced ReFeed, a novel framework for constructing datasets to train style-aware query rewriting models. The core problem they address is the frequent failure of retrieval systems when user queries differ stylistically or semantically from the language used in domain documents. While query rewriting has been proposed as a solution, most existing approaches overlook the crucial stylistic characteristics of target documents—their domain-specific phrasing, tone, and structure. ReFeed offers a data-centric solution by creating a feedback loop that automatically identifies retrieval failures, leverages large language models to rewrite queries to match document style, and verifies improvements through re-retrieval.
The resulting corpus of (original, rewritten) query pairs enables the training of specialized rewriter models that are explicitly aware of both document style and retrieval feedback. This work, accepted at the AAAI 2026 Workshop on New Frontiers in Information Retrieval, represents a significant shift toward feedback-driven, data-centric approaches in IR. By focusing on aligning query language with real-world document distributions, ReFeed enhances the reasoning and adaptability of RAG (retrieval-augmented generation) systems, particularly in specialized domains like legal, medical, or technical documentation where stylistic alignment is critical for accurate information retrieval.
- Automatically identifies failed retrieval cases and uses LLMs to rewrite queries in the target document's specific style and phrasing.
- Creates a verified corpus of query pairs to train rewriter models explicitly aware of document style and retrieval feedback.
- Highlights a data-centric IR direction where feedback loops and style alignment enhance RAG system performance in real-world domains.
Why It Matters
Improves RAG system accuracy in specialized fields by ensuring queries match the precise language used in domain documents.