Interactive Episodic Memory with User Feedback
Correct your AI's mistakes mid-search with simple feedback like 'before this' or 'not the white one'.
A team from the University of Texas at Austin and Adobe Research has proposed a new approach to episodic memory search in egocentric video that lets users correct the AI in real time. Their paper, accepted to CVPR 2026, introduces the Episodic Memory with Questions and Feedback (EM-QnF) task, where users can provide natural language feedback after an initial prediction to refine the answer. For example, after asking "Where did I place the mug?" and getting a wrong result, a user can say "Before this. I'm looking for the big blue mug not the white one" to guide the model to the correct moment.
The core technical contribution is a lightweight, plug-and-play Feedback Alignment Module (FALM) that adapts existing episodic memory models to incorporate user feedback without expensive sequential optimization. The authors also collected datasets for feedback-based interaction to train and evaluate their system. On three challenging benchmarks, FALM significantly improved over state-of-the-art EM-NLQ models and was competitive with commercial large vision-language models, while remaining efficient enough for real-world use. Evaluation with human-generated feedback showed strong generalization to real-world scenarios, marking a practical step toward interactive AI assistants that can learn from user corrections on the fly.
- Introduces EM-QnF task allowing users to give natural language feedback (e.g., 'not the white one, before this') to refine video search results.
- Proposes FALM, a plug-and-play module that adapts existing episodic memory models to use feedback without expensive retraining.
- Outperforms state-of-the-art models on three benchmarks and matches commercial large vision-language models while staying efficient.
Why It Matters
Moves AI from one-shot video search to interactive correction, making it practical for real-world memory retrieval.