VQPP: Video Query Performance Prediction Benchmark
New benchmark with 56K text queries and 51K videos tackles the underexplored frontier of video search prediction.
A team from the University of Bucharest has launched VQPP, the first dedicated benchmark for predicting the performance of video searches. Query Performance Prediction (QPP) is a critical task in information retrieval, used to gauge how well a search system will answer a query before or after executing it. While extensively studied for text and images, QPP for content-based video retrieval (CBVR) has remained largely unexplored. The VQPP benchmark aims to fill this gap, providing a standardized playground for developing and comparing video QPP models.
The benchmark is substantial, comprising two distinct text-to-video retrieval datasets with a total of 56,000 text queries and 51,000 videos. It comes with official training, validation, and test splits to ensure reproducible and comparable research. The researchers evaluated multiple pre-retrieval (predicting before search) and post-retrieval (predicting after search) performance predictors. A key finding is that pre-retrieval predictors achieved competitive performance, which is significant because it allows systems to estimate search quality without running the computationally expensive video retrieval step first.
To demonstrate practical utility, the authors used the best-performing pre-retrieval predictor as a reward model to train a large language model (LLM) on the task of query reformulation via Direct Preference Optimization (DPO). This shows how VQPP can directly feed into improving real-world search systems by generating better search queries. The release of the benchmark and code is a foundational step that could accelerate progress in video search, recommendation systems, and multimedia AI, enabling smarter, more efficient retrieval from the vast and growing repositories of video content online.
- First benchmark for Video Query Performance Prediction (VQPP), containing 56K queries and 51K videos across two datasets.
- Shows pre-retrieval predictors are competitive, enabling quality estimation before costly video search execution.
- Used the best predictor as a reward model to train an LLM for query reformulation via Direct Preference Optimization (DPO).
Why It Matters
Enables smarter, more efficient video search systems and paves the way for AI that can better understand and retrieve multimedia content.