Research & Papers

Zero-shot HOI Detection with MLLM-based Detector-agnostic Interaction Recognition

arXiv cs.CV February 18, 2026

⚡Researchers' MLLM-based framework achieves superior zero-shot performance on HICO-DET and V-COCO benchmarks.

Deep Dive

Researchers from Nanjing University of Science and Technology propose a novel zero-shot Human-Object Interaction (HOI) detection framework. It decouples object detection from interaction recognition (IR) using multi-modal LLMs (MLLMs), treating IR as a visual QA task with deterministic outputs. The method works with any object detector without retraining, achieving strong cross-dataset generalization and outperforming existing methods on standard benchmarks like HICO-DET and V-COCO.

Why It Matters

Enables AI systems to understand complex human-object interactions in images without task-specific training data.

Read Original Article

Zero-shot HOI Detection with MLLM-based Detector-agnostic Interaction Recognition

Why It Matters

Stay Ahead in AI