Research & Papers

Team Kang & You's DeePO for LQT achieves linear convergence, fixed-dimension control

New algorithm solves tracking control with constant decision variables, no data horizon growth.

Deep Dive

Direct data-driven optimal control has struggled with real-time applicability due to growing dimensionality of online decision variables. The recent DeePO (Data-EnablEd Policy Optimization) breakthrough for Linear Quadratic Regulators (LQR) solved this via sample-covariance parameterization, but extending it to Linear Quadratic Tracking (LQT) remained a fundamental challenge. The difficulty lies in the coupling between time-varying references and the feedback-feedforward policy structure, which prevents a direct application of constant-dimension parameterization.

Shubo Kang and Keyou You address this by first introducing a reference-decoupled reformulation of LQT that naturally accommodates the covariance parameterization, guaranteeing a fixed dimension of decision variables regardless of data horizon. This formulation is proven exactly equivalent to the indirect certainty-equivalence LQT solution. Leveraging this characterization, the authors develop both offline and online DeePO algorithms for LQT.

Theoretically, the paper proves global linear convergence for the offline algorithm using local gradient dominance and smoothness. For the online setting, the optimality gap decays linearly up to a bias term that scales inversely with the signal-to-noise ratio (SNR). Numerical simulations verify these theoretical results and demonstrate the superior tracking performance of the proposed method. This work paves the way for efficient real-time optimal control in applications such as robotics, autonomous vehicles, and industrial automation where time-varying targets are common.

Key Points
  • Reference-decoupled reformulation of LQT enables fixed-dimension decision variables independent of data horizon.
  • Offline DeePO algorithm for LQT achieves proven global linear convergence.
  • Online algorithm shows optimality gap linearly decaying with bias inversely proportional to signal-to-noise ratio.

Why It Matters

Enables efficient real-time optimal tracking control for autonomous systems with time-varying targets, reducing computational complexity.