Zero-shot HOI Detection with MLLM-based Detector-agnostic Interaction Recognition
Researchers' MLLM-based framework achieves superior zero-shot performance on HICO-DET and V-COCO benchmarks.
Researchers from Nanjing University of Science and Technology propose a novel zero-shot Human-Object Interaction (HOI) detection framework. It decouples object detection from interaction recognition (IR) using multi-modal LLMs (MLLMs), treating IR as a visual QA task with deterministic outputs. The method works with any object detector without retraining, achieving strong cross-dataset generalization and outperforming existing methods on standard benchmarks like HICO-DET and V-COCO.
Why It Matters
Enables AI systems to understand complex human-object interactions in images without task-specific training data.