ReCoVR: New AI system enables multi-round video retrieval with 74% accuracy
Single-round video search is dead — ReCoVR lets you refine via conversation.
Current composed video retrieval (CoVR) systems allow only a single interaction round: you give a reference video plus a text modifier, and get results. Real-world visual search, however, is progressive — users think of what they want only after seeing initial results. A new paper from researchers at (undisclosed institutions, authors Bingqing Zhang et al.) introduces ReCoVR (Reflexive Composed Video Retrieval), which formalizes interactive composed video retrieval as a multi-turn process. Users refine their intent through natural-language feedback across rounds. The system's key innovation is a dual-pathway architecture. The Intent Pathway takes heterogeneous feedback (text edits, relevance judgments) and sends it to complementary retrieval channels, preventing a single narrow search. The Reflection Pathway treats the system's own retrieval history as diagnostic evidence — it tracks result evolution and detects when the search is drifting or stagnating, then corrects the trajectory.
On multiple benchmarks, ReCoVR consistently beats interactive baselines. Most notably, after just one interactive turn on the WebVid-CoVR-Test dataset, it achieves 74.30% recall at rank 1 (R@1). That means in nearly three-quarters of searches, the very first result after one round of user feedback is the correct video. The work addresses a structural gap in existing retrieval methods and opens the door to more natural, conversational video search systems. For professionals in video archival, surveillance, or content discovery, this could mean dramatically faster and more accurate search loops — no more endless typing of new queries.
- ReCoVR extends composed video retrieval to multiple interactive rounds, letting users refine searches through natural-language feedback.
- Its dual-pathway architecture (Intent Pathway + Reflection Pathway) prevents drift by monitoring retrieval history alongside new feedback.
- Achieves 74.30% R@1 after just one interactive round on the WebVid-CoVR-Test dataset, outperforming all baselines.
Why It Matters
Makes video search conversational and highly accurate, saving professionals hours of iterative querying.