Chasing Autonomy: Dynamic Retargeting and Control Guided RL for Performant and Controllable Humanoid Running
Researchers achieved outdoor running with obstacle avoidance using a novel reinforcement learning and motion retargeting method.
A team from Caltech and UT Austin has published a breakthrough paper titled 'Chasing Autonomy: Dynamic Retargeting and Control Guided RL for Performant and Controllable Humanoid Running.' The research addresses a key limitation in robotics: while reinforcement learning (RL) can create dynamic motions, these are often restricted to simple playback of a single recorded movement. The team's novel pipeline starts by taking a single human motion demonstration and using an optimization routine with hard constraints to dynamically retarget it. This process generates an improved library of periodic reference motions, providing a richer foundation for the AI to learn from.
The core innovation is a goal-conditioned, control-guided reward structure that trains the RL policy to track these dynamically optimized references. This approach prioritizes both performance (speed and stability) and controllability (the ability to follow velocity commands). The team successfully deployed the resulting policy on a Unitree G1 humanoid robot in the real world. The hardware results are impressive: the robot achieved running speeds of up to 3.3 meters per second and demonstrated the endurance to traverse hundreds of meters.
Most significantly, the controller's quality allowed its integration into a complete autonomy stack. The researchers demonstrated the system's practical utility by performing dynamic obstacle avoidance while running outdoors. This moves beyond lab-bound demonstrations and shows the pipeline's potential for real-world deployment where robots must perceive their environment, plan paths, and execute agile maneuvers simultaneously.
- The pipeline uses dynamic retargeting to create a varied motion library from just one human demo, solving the 'single motion playback' problem.
- A control-guided RL reward structure enabled the Unitree G1 robot to run at 3.3 m/s and cover hundreds of meters of outdoor terrain.
- The controller was integrated into a full perception and planning stack, successfully executing obstacle avoidance while running, a key step toward true autonomy.
Why It Matters
This work enables more durable, agile, and truly autonomous humanoid robots for real-world applications like search & rescue or logistics.