Too much sim2real backfires, new paper proposes sim2sim2real solution
Sim2real efforts cause 'simulator lock-in' and hinder robot policy learning, researchers find.
In a new paper published on arXiv (2606.02636), researchers from the University of Texas at Austin argue that the robotics community's drive for sim2real transfer has created a counterproductive dynamic. They claim that efforts to make simulations perfectly mimic real-world physics often trap policies in 'simulator lock-in'—where models perform well in simulation but fail to explore diverse strategies needed for robust real-world operation. The root cause, they say, is that imposing real-world constraints during training penalizes novel or unconventional behaviors that might actually be effective on hardware. This leads to poor policy exploration and ultimately weaker transfer performance.
To break this cycle, the team proposes a 'sim2sim2real' paradigm. Instead of directly going from simulation to reality, they advocate for an intermediate step: first train policies in a simulation where only the robot's kinematic constraints (joint limits, geometry) are enforced, ignoring physics like friction or contact dynamics. This allows the policy to explore a richer set of behaviors. Then, in a second simulation phase, physics is gradually introduced. The key insight is that the robot's kinematics are the only truly universal constraint across all simulators and real environments. By decoupling kinematic from physics constraints, the approach could produce policies that are both more creative in simulation and more adaptable when transferred to hardware. The paper provides both a theoretical diagnosis and early experimental evidence, suggesting a significant shift in how the robotics community might design training pipelines for dexterous manipulation and locomotion.
- Current sim2real practices cause 'simulator lock-in', where policies overfit to simulation physics and fail to explore novel behaviors.
- Real-world constraints introduced during training are identified as the primary cause of misaligned incentives that impede policy learning.
- The proposed 'sim2sim2real' paradigm uses robot kinematics as the sole design constraint in the first simulation phase, increasing behavioral diversity.
Why It Matters
For robotics engineers, this research suggests rethinking simulation design to prioritize exploration over realistic physics for better real-world transfer.