Neuromorphic RL lets robots learn on-the-go with 4.3x memory efficiency
A new equilibrium-propagation method trains quadruped robots locally without backpropagation...
Reinforcement learning has enabled robust quadruped locomotion over complex terrain, but most controllers are trained offline via backpropagation and deployed as fixed policies—limiting adaptation to terrain variation, payload changes, or actuator wear. Researchers Zhuangyu Han and Abhronil Sengupta tackle this by proposing an equilibrium-propagation (EP) based Proximal Policy Optimization (PPO) framework that replaces global backpropagation with local neural state updates. Their controller marries a bio-inspired central pattern generator (CPG) for rhythmic gait with a residual postural adjustment policy, all trained using EP-compatible PPO with a novel output-nudging signal and two-sided ratio clipping to stabilize policy updates during relaxation.
The team tested the approach on a 12-degree-of-freedom Unitree A1 quadruped over two-stage uneven terrain locomotion. Results show the EP-trained controller achieves performance comparable to a conventional backpropagation-trained PPO baseline in terms of success rate, velocity tracking, actuator power, and body stability. Critically, it delivers a 4.3× improvement in GPU memory efficiency compared to backpropagation through time (BPTT). These findings demonstrate that local equilibrium-based learning can support high-dimensional embodied locomotion, offering an algorithmic foundation for energy-aware on-robot adaptation and fine-tuning in real-world settings.
- Equilibrium-propagation (EP) replaces backpropagation with local neural updates for quadruped locomotion training
- EP-PPO matches backprop-trained PPO in success rate, velocity tracking, and stability on a 12-DoF A1 robot
- 4.3× GPU memory efficiency gain over backpropagation through time (BPTT)
Why It Matters
Enables energy-efficient, adaptive robot control directly on hardware, crucial for real-world deployment.