Research & Papers

Convex Optimization Unlocks Robust Inverse RL for Uncertain Linear Systems

New data-driven method replaces iterative loops with convex optimization, achieving robust cost recovery from expert trajectories.

Deep Dive

Researchers Duc Cuong Nguyen and Phuong Nam Dao have developed a novel convex-optimization-based framework for data-driven inverse reinforcement learning (IRL) in discrete-time linear systems, addressing both nominal and uncertain models. Traditional IRL methods rely on iterative policy/value updates, repeated matrix inversions, and often require an initial stabilizing controller—limitations that hurt numerical robustness and practical deployment. Their approach replaces these iterative loops with a semidefinite programming formulation that directly recovers an equivalent state-cost matrix and a stabilizing controller from expert trajectories. For systems with model uncertainty, they show that standard LQR costs are insufficient to represent all stabilizing target gains, prompting the introduction of a generalized LQR cost with a state–input cross term.

Extending the method to handle model perturbations, the authors employ differentiable semidefinite programming and stochastic approximation for robust cost design over a population of uncertainties. The framework is model-free and off-policy: unknown system matrices are replaced with a regressed kernel matrix from local input–state data. Simulated on a discrete-time power system example, the technique accurately recovers expert behavior while demonstrating stronger robustness to gain-estimation errors and model mismatch than classical iterative IRL schemes. This work opens the door to practical, computationally simpler IRL for control systems where dynamics are uncertain.

Key Points
  • Avoids iterative policy/value updates and matrix inversions using convex semidefinite programming for cost recovery.
  • Introduces a generalized LQR cost with state–input cross term to handle uncertain linear systems where standard LQR fails.
  • Simulations on a power-system example show accurate behavior recovery and improved robustness to model mismatch and gain-estimation errors.

Why It Matters

Simpler, more robust inverse RL for real-world control systems with uncertain dynamics—critical for robotics, autonomous vehicles, and power grids.