What Matters for Simulation to Online Reinforcement Learning on Real Robots
After 100 real-world robot training runs, researchers identified which RL practices actually work on hardware.
A research team from ETH Zurich, led by Yarden As and including Markus Wulfmeier, has published a landmark empirical study titled 'What Matters for Simulation to Online Reinforcement Learning on Real Robots.' The paper, available on arXiv (2602.20220), systematically investigates the specific algorithmic, systems, and experimental design choices that enable successful online reinforcement learning (RL) on physical hardware. This work addresses a critical gap in robotics research, where many implementation details are often left implicit, leading to inconsistent results and high engineering overhead when transferring simulated policies to the real world.
The study's conclusions are based on an extensive experimental campaign of 100 real-world training runs conducted across three distinct robotic platforms. The researchers performed systematic ablations to test common assumptions and found that several widely adopted defaults in RL practice can actually be detrimental to real-robot training. Conversely, they identified a coherent set of robust design choices that consistently yielded stable learning across different tasks and hardware. This provides the first large-sample empirical guide for practitioners, effectively creating a blueprint to reduce the trial-and-error engineering typically required to get online RL working reliably on physical systems.
- Based on 100 real-world training runs across three distinct robotic platforms.
- Found that several widely used RL defaults are harmful for real-robot deployment.
- Identified a specific set of robust design choices that yield stable learning across tasks and hardware.
Why It Matters
Provides a practical blueprint to reduce engineering effort and increase success rates when deploying RL on physical robots.