Identifies four methods to extract action information from human videos?

latent actions, world models, 2D supervision, and 3D reconstruction.

Addresses core challenges?

structuring unlabeled videos, bridging embodiment gaps, and improving evaluation protocols.

Accepted at IJCAI 2026 Survey Track; includes a curated resource list for researchers?

Accepted at IJCAI 2026 Survey Track; includes a curated resource list for researchers.

Robotics

IJCAI 2026 survey: 4 ways to train robots from human videos

arXiv cs.RO June 02, 2026

⚡Human videos are cheap, robot data is scarce—here’s how to bridge the gap.

Deep Dive

Robot manipulation models typically require expensive, embodiment-specific demonstrations. A survey by Zhiyuan Feng and 14 co-authors, accepted at IJCAI 2026, presents a unified framework for leveraging abundant human videos instead. They categorize approaches into four classes: (i) latent action representations that encode inter-frame changes, (ii) predictive world models that forecast future frames, (iii) explicit 2D supervision extracting image-plane cues, and (iv) explicit 3D reconstruction recovering geometry or motion. Each class addresses different aspects of the embodiment and annotation gap.

The survey also highlights three open challenges: turning unstructured human videos into training-ready episodes, grounding video-derived supervision into robot-executable actions despite embodiment and viewpoint differences, and designing evaluation protocols that better predict real-world deployment performance. The authors provide a curated list of papers and resources. This work offers a roadmap for scalable, human-data-driven robot learning, potentially reducing the cost and increasing the generality of embodied AI.

Key Points

Identifies four methods to extract action information from human videos: latent actions, world models, 2D supervision, and 3D reconstruction.
Addresses core challenges: structuring unlabeled videos, bridging embodiment gaps, and improving evaluation protocols.
Accepted at IJCAI 2026 Survey Track; includes a curated resource list for researchers.

Why It Matters

Scales robot learning by using cheap human videos, reducing reliance on costly robot demos and enabling more flexible embodied AI.

Read Original Article

IJCAI 2026 survey: 4 ways to train robots from human videos

Why It Matters

Related Articles

🚀 Stay Ahead in AI