First framework to learn dexterous manipulation from a single egocentric RGB-D video without requiring pre-scanned object assets?

First framework to learn dexterous manipulation from a single egocentric RGB-D video without requiring pre-scanned object assets.

Uses asset-free object tracking, ego-motion compensation, and adaptive contact optimization to reconstruct contact-consistent hand-object trajectories?

Uses asset-free object tracking, ego-motion compensation, and adaptive contact optimization to reconstruct contact-consistent hand-object trajectories.

Introduces EgoDex-R, a 4.3 million frame egocentric dataset for dexterous policy learning; performance matches CAD-based methods on HOI4D?

Introduces EgoDex-R, a 4.3 million frame egocentric dataset for dexterous policy learning; performance matches CAD-based methods on HOI4D.

Robotics

EgoAERO learns dexterous robot manipulation from a single video

arXiv cs.RO June 09, 2026

⚡No 3D scans needed — just one egocentric RGB-D video to teach robots.

Deep Dive

EgoAERO, developed by a team led by Yichen Niu, tackles a long-standing bottleneck in robot learning: the need for costly pre-scanned object models. By using a single egocentric RGB-D video of a human hand manipulating an object, the framework reconstructs contact-consistent hand-object trajectories without any prior knowledge of the object’s geometry. It does this through asset-free object tracking, ego motion compensation, and adaptive contact optimization, all working together to infer both pose and interactions. These trajectories are then converted into robot policies via a two-stage residual learning approach, allowing a robot to replicate the dexterous task after seeing just one demonstration.

To support broader research, the authors introduce EgoDex-R, a large-scale dataset containing 4.3 million RGB-D frames of dexterous manipulations captured from an egocentric viewpoint. In both simulation and real-world experiments, EgoAERO achieves single-demonstration dexterous manipulation with performance approaching that of CAD-based reconstructions on the HOI4D benchmark. This marks a significant step toward scaling robot learning from human video, dramatically reducing the data and asset preparation required for teaching robots fine-grained manipulation skills.

Key Points

First framework to learn dexterous manipulation from a single egocentric RGB-D video without requiring pre-scanned object assets.
Uses asset-free object tracking, ego-motion compensation, and adaptive contact optimization to reconstruct contact-consistent hand-object trajectories.
Introduces EgoDex-R, a 4.3 million frame egocentric dataset for dexterous policy learning; performance matches CAD-based methods on HOI4D.

Why It Matters

Removes the need for 3D object scans, making robot learning from human video far more scalable and practical.

Read Original Article

EgoAERO learns dexterous robot manipulation from a single video

Why It Matters

Related Articles

Stay Ahead in AI