SDPG trains visual RL policies end-to-end on a single RTX 4080 GPU in a few hours, cutting compute by orders of magnitude?

SDPG trains visual RL policies end-to-end on a single RTX 4080 GPU in a few hours, cutting compute by orders of magnitude.

It uses random perturbations of trajectory rollouts to estimate policy gradients, requiring far fewer batch-rendered environments?

It uses random perturbations of trajectory rollouts to estimate policy gradients, requiring far fewer batch-rendered environments.

Outperforms baselines on visual MuJoCo benchmarks and demonstrates successful sim-to-real transfer on physical robots?

Outperforms baselines on visual MuJoCo benchmarks and demonstrates successful sim-to-real transfer on physical robots.

Robotics

New SDPG method trains visual RL policies in hours on a single RTX 4080

arXiv cs.RO May 27, 2026

⚡SDPG cuts compute by orders of magnitude, trains robots in hours.

Deep Dive

A team of researchers led by Haoxiang You from Yale University has introduced Stochastic Decoupled Policy Gradient (SDPG), a new visual reinforcement learning method that dramatically reduces the computational cost of training visuomotor control policies. SDPG estimates policy gradients via random perturbations of trajectory rollouts, eliminating the need for massive batch-rendered environments typical in on-policy RL. This allows the entire training process to run end-to-end on a single NVIDIA RTX 4080 GPU in just a few hours, while maintaining or improving performance. On visual MuJoCo benchmarks, SDPG consistently beats baselines in training time, memory usage, and cumulative rewards, making it accessible to researchers without large-scale compute clusters.

The paper also introduces a new suite of realistic visual robotics benchmarks spanning dexterous manipulation and challenging locomotion tasks. Crucially, the authors demonstrate effective sim-to-real transfer on physical hardware, showing that policies trained purely in simulation with SDPG can directly control real robots. This bridging of the simulation-to-reality gap is a major hurdle in robotics, and SDPG's efficiency makes iterative training much more practical. The method's lightweight nature could accelerate progress in robot learning, enabling more researchers to experiment with sophisticated visual policies without needing expensive infrastructure.

Key Points

SDPG trains visual RL policies end-to-end on a single RTX 4080 GPU in a few hours, cutting compute by orders of magnitude.
It uses random perturbations of trajectory rollouts to estimate policy gradients, requiring far fewer batch-rendered environments.
Outperforms baselines on visual MuJoCo benchmarks and demonstrates successful sim-to-real transfer on physical robots.

Why It Matters

SDPG makes visual robot learning accessible to any lab with a single GPU, accelerating real-world deployment.

Read Original Article

New SDPG method trains visual RL policies in hours on a single RTX 4080

Why It Matters

Related Articles

🚀 Stay Ahead in AI