Research & Papers

Optimistic Online LQR via Intrinsic Rewards

New algorithm achieves optimal √T regret rate while being 10x simpler than existing methods.

Deep Dive

A team from ETH Zurich led by Marcell Bartos, Andreas Krause, and Florian Dörfler has developed IR-LQR (Intrinsic Rewards LQR), a novel algorithm for online control of linear dynamical systems. The approach applies reinforcement learning concepts of intrinsic rewards and variance regularization to the classic Linear Quadratic Regulator problem, creating an "optimistic" algorithm that balances exploration and exploitation. Unlike existing methods that require complex iterative search algorithms or computationally demanding optimization, IR-LQR retains the standard LQR structure by only modifying the cost function, resulting in what the authors describe as an "intuitively pleasing, simple, computationally cheap, and efficient algorithm."

The algorithm achieves the optimal worst-case regret rate of √T, meaning its performance improves predictably over time as it collects more data. The researchers validated IR-LQR through numerical experiments on two practical control problems: aircraft pitch angle control and unmanned aerial vehicle (UAV) operation. These tests compared IR-LQR against various state-of-the-art online LQR algorithms, demonstrating its effectiveness in learning control policies for unknown systems using only closed-loop data collected during real-time operation. The approach represents a significant simplification in the field of adaptive control, potentially making advanced control algorithms more accessible for real-world applications where computational resources are limited.

Key Points
  • Achieves optimal √T regret rate for online learning of linear systems
  • Maintains standard LQR structure with only cost function modifications
  • Tested on aircraft pitch control and UAV examples with state-of-the-art comparisons

Why It Matters

Enables simpler, more efficient AI control systems for autonomous vehicles, drones, and industrial automation.