Research & Papers

Scalar Federated Learning for Linear Quadratic Regulator

New algorithm trains fleets of robots or drones by sending just one number per agent, not gigabytes of data.

Deep Dive

A team of researchers has introduced ScalarFedLQR, a breakthrough federated learning algorithm designed for training optimal control policies across fleets of heterogeneous agents, such as drones, robots, or autonomous vehicles. The core innovation is a decomposed projected gradient mechanism where each agent calculates a local estimate of how to improve its control policy but communicates only a single, carefully chosen scalar projection of that data to a central server. This reduces the communication cost per agent from scaling with the policy's dimension (O(d)) to a constant (O(1)), a monumental reduction that can turn gigabytes of network traffic into a trickle of numbers.

Crucially, the method turns a potential weakness—the information loss from compressing data to a scalar—into a strength through scaling laws. As more agents participate in the fleet, the server's aggregation of their individual scalar messages becomes a highly accurate reconstruction of the true global gradient direction needed for training. This means larger fleets not only enable more accurate learning but also allow for larger algorithmic step sizes, leading to faster linear convergence to an optimal control policy. The researchers proved that all intermediate policies remain stable and demonstrated in simulations that ScalarFedLQR achieves performance comparable to full-gradient methods while using a fraction of the communication bandwidth.

This approach is specifically for model-free learning of Linear Quadratic Regulator (LQR) problems, a foundational framework in control theory for balancing performance and energy use. By solving the communication bottleneck, ScalarFedLQR makes it practically feasible to continuously and privately train control policies for massive, distributed systems where raw data sharing is impossible or prohibitively expensive.

Key Points
  • Cuts communication from O(d) to O(1): Each agent sends only one number instead of high-dimensional data, reducing bandwidth by 99%+ for complex policies.
  • Leverages fleet size for accuracy: The approximation error from scalar projection diminishes as more agents join, creating a favorable scaling law for large systems.
  • Enables linear convergence: Under standard conditions, the algorithm guarantees stable policies and decreases the average system cost linearly fast, matching full-gradient performance.

Why It Matters

Enables real-time, privacy-preserving AI training for massive fleets of robots, drones, and IoT devices where bandwidth is limited.