Robotics

Robust Koopman-CBF SAC achieves zero safety violations in RL training

New safety filter uses Koopman theory to enforce constraints without exact dynamics, hitting zero violations on CartPole.

Deep Dive

Safe reinforcement learning for robotics demands policies that satisfy constraints during both training and deployment. Control barrier functions (CBFs) offer a rigorous way to enforce safety, but they typically require accurate dynamics models and hand-crafted barrier certificates—both hard to obtain in model-free RL. To bridge this gap, Kushwaha and Biron introduce Robust Koopman-CBF SAC, a safety-filtered actor-critic method that learns a finite-dimensional Koopman predictor from data. This predictor lifts the system into a linear space where affine CBF constraints become tractable, and a quadratic-program safety layer minimally adjusts actions to guarantee forward invariance. To account for approximation errors from finite-dimensional Koopman models, the authors tighten the CBF condition using a projected residual margin estimated from held-out rollout data. The critic is trained on the executed safe action, while the actor is regularized toward the Koopman-CBF feasible set, reducing reliance on the filter over successive training steps.

Across benchmarks, the method achieves zero constraint violations on CartPole stabilization and tracking tasks while matching or exceeding unconstrained soft actor-critic (SAC) returns—a strong sign that safety need not come at the cost of performance. On more challenging, high-dimensional Safety Gymnasium locomotion tasks, Robust Koopman-CBF SAC reduces violations in some settings but also exposes meaningful limitations: first-order velocity barrier constraints and linear EDMD models struggle to capture complex dynamics, leading to residual violations. The authors explicitly highlight the need for high-order and multi-step Koopman-CBF extensions. These results position robust Koopman-CBF filters as a promising bridge between model-free RL and certifiable safety, while clarifying the structural conditions—such as low-dimensional, smooth dynamics—under which the approach remains effective. All code is available on GitHub.

Key Points
  • Zero constraint violations on CartPole stabilization and tracking tasks
  • Matches or exceeds unconstrained SAC returns while enforcing safety
  • Reveals limitations in high-dimensional tasks, motivating high-order and multi-step Koopman-CBF extensions

Why It Matters

Bridges model-free reinforcement learning with certifiable safety for real-world robotics, reducing the need for hand-tuned dynamics.