SeeTraceAct enables one-shot learning from a single human demonstration video, no teleoperation required?

SeeTraceAct enables one-shot learning from a single human demonstration video, no teleoperation required.

Uses visibility-aware prediction of future end-effector traces for precise spatial grounding of small targets?

Uses visibility-aware prediction of future end-effector traces for precise spatial grounding of small targets.

Achieves 12.5 percentage point improvement in real-world success rate on a Franka Panda arm conditioned on human demos?

Achieves 12.5 percentage point improvement in real-world success rate on a Franka Panda arm conditioned on human demos.

Robotics

SeeTraceAct: Robots learn new tasks from a single human demo video

arXiv cs.RO June 03, 2026

⚡One-shot robot learning from human video, no teleoperation needed.

Deep Dive

Training robots for new tasks typically requires costly task-specific teleoperation data. In a new paper, researchers present SeeTraceAct, a demo-conditioned vision-language-action model that learns from just one demonstration video of an unseen task—even if the demo comes from a different embodiment (e.g., a human). The key innovation is visibility-aware prediction of future end-effector traces, which forces the model to precisely localize small target regions rather than relying on coarse visual cues. This one-shot approach dramatically reduces data collection costs.

To enable reproducible evaluation with cross-embodiment demonstrations, the team releases RoboCasa-DC, an extension of RoboCasa with episode-paired humanoid videos. Experiments on both RoboCasa-DC and a real-world setup—where a Franka Panda arm is conditioned on human demonstrations—show that SeeTraceAct outperforms existing end-to-end baselines, achieving the best success rate across all four simulated settings and a 12.5 percentage point improvement in real-world average success. This work paves the way for robots that can learn new skills from casual human videos, without expensive teleoperation or hundreds of examples.

Key Points

SeeTraceAct enables one-shot learning from a single human demonstration video, no teleoperation required.
Uses visibility-aware prediction of future end-effector traces for precise spatial grounding of small targets.
Achieves 12.5 percentage point improvement in real-world success rate on a Franka Panda arm conditioned on human demos.

Why It Matters

One-shot robot learning from human video drastically cuts data costs, enabling rapid task adaptation in real-world settings.

Read Original Article

SeeTraceAct: Robots learn new tasks from a single human demo video

Why It Matters

Related Articles

🚀 Stay Ahead in AI