Generative Simulation for Policy Learning in Physical Human-Robot Interaction
A new framework uses LLMs to generate synthetic training data, achieving over 80% success in real-world assistive tasks.
A team of researchers has developed a novel AI pipeline called 'text2sim2real' that automates the creation of training data for robots that physically interact with humans. The system uses large language models (LLMs) and vision-language models (VLMs) to interpret high-level text prompts and procedurally generate entire simulation environments. This includes creating soft-body human models, designing scene layouts, and synthesizing appropriate robot motion trajectories for assistive tasks, all without manual coding or data collection.
This generative simulation framework was used to autonomously collect large-scale synthetic demonstration datasets. The researchers then trained vision-based imitation learning policies on this data, which operated on segmented point clouds. The trained policies were evaluated in a user study on two real-world assistive tasks: scratching an itch and bathing. Remarkably, these policies achieved zero-shot sim-to-real transfer, meaning they worked on a physical robot without additional real-world training, attaining success rates exceeding 80% and showing resilience to unscripted human motion.
The work represents a significant leap in addressing the data scarcity problem that has long hindered the development of robust physical human-robot interaction systems. By automating the entire pipeline from environment synthesis to policy learning, the method drastically reduces the time, cost, and safety risks associated with collecting real-world physical interaction data. The team has made additional information available on their project website, providing a foundation for future research in generative simulation for robotics.
- The 'text2sim2real' framework uses LLMs/VLMs to generate synthetic training data from text prompts, automating simulation creation.
- Policies trained on this synthetic data achieved over 80% success in real-world assistive tasks like scratching and bathing with zero-shot transfer.
- The method demonstrates resilience to unscripted human motion, a critical hurdle for safe and reliable physical human-robot interaction.
Why It Matters
This drastically reduces the cost and safety risks of training assistive robots, accelerating development for healthcare and personal assistance.