Model-free LQG control with chance constraints achieves linear convergence
First convergence proof for NPG actor-critic in chance-constrained LQG without model knowledge.
A new paper from Arunava Naha and Subhrakanti Dey tackles a longstanding challenge in optimal control: how to enforce probabilistic risk (chance) constraints in linear-quadratic Gaussian (LQG) systems when the system model is unknown. The authors introduce a natural policy gradient (NPG)-based actor-critic (AC) algorithm operating on two timescales, wrapped in a Lagrangian primal-dual framework to handle constraints. This is the first work to prove analytical convergence for an NPG-AC method in a chance-constrained LQG setting without any model knowledge. The analysis establishes coercivity and gradient dominance properties of the Lagrangian function, which guarantee linear convergence and closed-loop stability during actor training. For the critic, they leverage temporal difference (TD(0)) learning and stochastic approximation theory to demonstrate reliable convergence. The paper also proves there is no duality gap, meaning the Lagrangian relaxation is exact.
Numerical experiments compare the proposed model-free approach against model-based chance-constrained LQR and scenario-based MPC. Results demonstrate that the algorithm effectively limits the probability of violating state thresholds to user-specified levels while maintaining near-optimal performance — all without needing to know the system dynamics or solve real-time optimization problems. This work is under review at the IEEE Open Journal of Control Systems and marks a significant step toward practical, trustworthy control for systems where risk constraints are critical (e.g., autonomous vehicles, robotics, power systems) and model uncertainty is unavoidable.
- First convergence analysis for NPG-based actor-critic in chance-constrained LQG without model knowledge
- Proves linear convergence and closed-loop stability via coercivity and gradient dominance of Lagrangian
- Numerical comparison shows risk limiting matches model-based LQR/MPC while avoiding real-time optimization
Why It Matters
Enables autonomous systems to enforce safety constraints without needing accurate system models, crucial for real-world robotics and power grids.