Agent Frameworks

High-Probability Convergence in Decentralized Stochastic Optimization with Gradient Tracking

GT-DSGD offers first high-probability guarantees for bias-corrected decentralized optimization.

Deep Dive

A new paper from Aleksandar Armacki, Haoyuan Cai, and Ali H. Sayed at EPFL introduces GT-DSGD, a decentralized stochastic optimization algorithm that incorporates gradient tracking to achieve high-probability (HP) convergence guarantees. Prior HP results for decentralized optimization relied heavily on Decentralized SGD (DSGD) and required strong assumptions such as bounded data heterogeneity or strong convexity of each agent's cost. This limited their applicability, especially compared to mean-squared error (MSE) analyses that allow bias-correction techniques like gradient tracking to work under far milder conditions.

GT-DSGD bridges this gap by proving order-optimal HP rates for both non-convex and Polyak-Łojasiewicz (PL) objectives. Specifically, the algorithm achieves O(log(1/δ)/√(nT)) for non-convex costs and O(log(1/δ)/(nT)) for PL costs, where n is the number of agents, T is the time horizon, and δ is the confidence parameter. These rates match the best possible under the given noise model, and crucially, they require only a relaxed sub-Gaussian noise condition—no bounded heterogeneity or strong convexity. Numerical experiments on real and synthetic data validate the theory, showing that GT-DSGD maintains the practical benefits of bias-correction even in the high-probability sense. This is the first HP guarantee for any decentralized optimization method incorporating bias-correction, marking a significant step toward more robust multi-agent machine learning.

Key Points
  • GT-DSGD achieves high-probability convergence under the same relaxed assumptions as MSE analyses, unlike prior DSGD results.
  • Order-optimal rates: O(log(1/δ)/√(nT)) for non-convex costs and O(log(1/δ)/(nT)) for Polyak-Łojasiewicz costs.
  • First HP guarantees for bias-corrected decentralized optimization methods, validated on real and synthetic data.

Why It Matters

Enables more reliable decentralized training with looser assumptions, boosting trust in multi-agent ML systems.