Near-Optimal Sample Complexity for Online Constrained MDPs
The model-based primal-dual algorithm achieves sample complexity matching theoretical lower bounds for constrained RL.
Researchers Chang Liu, Yunfan Li, and Lin F. Yang developed a new model-based primal-dual algorithm for online Constrained Markov Decision Processes (CMDPs). It achieves near-optimal sample complexity of Õ(SAH³/ε²) for relaxed feasibility (allowing small safety violations) and Õ(SAH⁵/ε²ζ²) for strict feasibility (zero violations). This proves learning safe RL in online settings is as efficient as using a generative model, enabling faster training of safe autonomous systems.
Why It Matters
Enables faster, safer deployment of RL in critical real-world applications like autonomous driving and healthcare.