Research & Papers

Robust Test-time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts

arXiv cs.IR April 24, 2026

⚡Query shifts cause 12 types of perturbations; HAT-VTR reduces hubness by 40%.

Deep Dive

A team of researchers from multiple institutions has introduced HAT-VTR (Hubness Alleviation for Test-time Video-Text Retrieval), a novel framework designed to make video-text retrieval (VTR) models robust to real-world query shifts. Modern VTR models excel on in-distribution benchmarks but fail when query distributions deviate from training data, a problem the team systematically evaluates with a new benchmark featuring 12 distinct video perturbation types across 5 severity degrees. Their analysis reveals that query shifts amplify the hubness phenomenon, where a few gallery items become dominant 'hubs' that attract a disproportionate number of queries, causing sharp performance drops.

To counter this, HAT-VTR leverages two key components: a Hubness Suppression Memory that refines similarity scores to reduce hub dominance, and multi-granular losses that enforce temporal feature consistency across video frames. Extensive experiments demonstrate that HAT-VTR substantially improves robustness, consistently outperforming prior methods across diverse query shift scenarios. Accepted at ICLR 2026, this work provides a critical baseline for deploying VTR systems in dynamic environments where query patterns shift unpredictably, such as surveillance, content moderation, or personalized video search.

Key Points

Benchmark includes 12 perturbation types (e.g., blur, occlusion, temporal jitter) at 5 severity levels to simulate real-world query shifts.
HAT-VTR uses Hubness Suppression Memory to refine similarity scores and multi-granular losses for temporal consistency.
Accepted at ICLR 2026; outperforms prior methods across all tested query shift scenarios.

Why It Matters

Makes video retrieval reliable in unpredictable environments like surveillance or content moderation.

Read Original Article

Robust Test-time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts

Why It Matters

Stay Ahead in AI