Research & Papers

Near-Optimal Sample Complexity for Online Constrained MDPs

The model-based primal-dual algorithm achieves sample complexity matching theoretical lower bounds for constrained RL.

Deep Dive

Researchers Chang Liu, Yunfan Li, and Lin F. Yang developed a new model-based primal-dual algorithm for online Constrained Markov Decision Processes (CMDPs). It achieves near-optimal sample complexity of Õ(SAH³/ε²) for relaxed feasibility (allowing small safety violations) and Õ(SAH⁵/ε²ζ²) for strict feasibility (zero violations). This proves learning safe RL in online settings is as efficient as using a generative model, enabling faster training of safe autonomous systems.

Why It Matters

Enables faster, safer deployment of RL in critical real-world applications like autonomous driving and healthcare.