Research & Papers

KBSE: Kernel-based safe RL exploration without reward loss

Learns barrier functions via kernel embeddings for safe exploration

Deep Dive

Safety remains a critical bottleneck for deploying deep RL in real-world settings. A common approach uses barrier functions—mapping states to values that decrease in expectation—to bound the probability of reaching unsafe states. Previous attempts to learn barriers from data required large datasets or restrictive assumptions about system dynamics. The new KBSE algorithm overcomes these limitations by leveraging kernel embeddings to represent barrier functions during exploration. Barriers are computed iteratively as conditional mean embeddings, improving safety guarantees as more data is collected.

KBSE integrates barrier learning directly into the RL loop. When the learned barrier indicates a safety violation, the algorithm intervenes, modifying the proposed action to a safe one. This keeps exploration restricted to actions that keep the probability of entering unsafe states low. The team tested KBSE on several complex continuous control benchmarks. Results show that KBSE synthesizes policies that are probabilistically safe without sacrificing accumulated reward, making it suitable for safety-critical applications like robotics and autonomous systems.

Key Points
  • KBSE learns barrier functions as conditional mean embeddings via kernel methods
  • Intervenes during exploration to modify unsafe actions, ensuring probabilistic safety bounds
  • Maintains reward performance on complex continuous control tasks

Why It Matters

Enables safer real-world deployment of RL in robotics and autonomous driving without performance penalty.