Multi-Gait Learning for Humanoid Robots Using Reinforcement Learning with Selective Adversarial Motion Prior
A new RL framework uses a 'selective' adversarial approach to teach a 12-DOF robot walking, running, and jumping.
A research team has published a paper detailing a novel reinforcement learning (RL) framework that enables a humanoid robot to learn five distinct locomotion gaits within a single, unified policy. The approach, centered on a 'selective Adversarial Motion Prior (AMP)' strategy, intelligently applies adversarial training only to periodic, stability-critical motions like walking and stair climbing. This selective application accelerates learning and suppresses erratic behavior for these gaits, while deliberately omitting AMP for highly dynamic skills like running and jumping, where its regularization would over-constrain and limit agility.
The policies were trained in simulation using Proximal Policy Optimization (PPO) with domain randomization and then deployed via zero-shot sim-to-real transfer onto a physical 12-degree-of-freedom (DOF) humanoid robot. Quantitative results show the selective AMP method outperforms a policy using a uniform AMP approach across all five gaits. It achieved faster convergence, lower tracking error, and higher success rates on stability-focused tasks without sacrificing the performance of dynamic ones, demonstrating a more efficient and capable framework for versatile robot locomotion.
- Uses a 'selective Adversarial Motion Prior (AMP)' applied only to walking, goose-stepping, and stair climbing for stability.
- Trains a single policy to master five distinct gaits, deployed on a physical 12-DOF robot via zero-shot sim-to-real transfer.
- Outperformed uniform AMP, achieving faster convergence and higher success rates without compromising dynamic motion agility.
Why It Matters
This advances versatile, real-world humanoid robots by making multi-gait training more efficient and effective within a single AI model.