Robotics

ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

New framework uses frozen diffusion policies and RL to create expert robot controllers from messy demonstrations.

Deep Dive

A research team from UT Austin, Toyota Research Institute, and other institutions has introduced ExpertGen, a novel framework designed to solve a major bottleneck in robotics: the scarcity of high-quality, real-world demonstration data. Traditional methods rely on expensive human teleoperation, but ExpertGen automates expert policy generation entirely in simulation for scalable sim-to-real transfer. The core innovation is a two-stage process. First, it initializes a 'behavior prior' using a diffusion model trained on imperfect demonstrations—data that could be synthesized by large language models or provided by non-expert humans. Crucially, this pre-trained diffusion policy is then frozen.

In the second stage, reinforcement learning is applied not to the policy's parameters, but to optimize the initial noise input to the frozen diffusion model. This approach regularizes exploration, keeping it within safe, human-like behavior patterns while enabling effective learning with only sparse reward signals—eliminating the need for complex reward engineering. Empirical results are striking: on challenging industrial assembly benchmarks, ExpertGen achieved a 90.5% overall success rate, and it attained 85% success on long-horizon manipulation tasks, outperforming all baseline methods. The resulting policies demonstrated dexterous control and robustness across diverse conditions. To validate real-world application, the team distilled these state-based policies into visuomotor policies using DAgger (Dataset Aggregation) and successfully deployed them on physical robotic hardware, confirming effective sim-to-real transfer.

Key Points
  • Uses a frozen diffusion policy as a behavior prior, refined via RL on initial noise, requiring only sparse rewards and no reward engineering.
  • Achieved a 90.5% success rate on industrial assembly tasks and 85% on long-horizon manipulation, outperforming baseline methods.
  • Policies were successfully distilled and transferred to real robotic hardware, validating the sim-to-real pipeline.

Why It Matters

Dramatically reduces the cost and complexity of training robust robot controllers, moving us closer to scalable automation.