Mirror Descent on Riemannian Manifolds
New framework extends scalable Mirror Descent to curved spaces like the Stiefel manifold, with proven convergence guarantees.
A team of researchers including Jiaxin Jiang has published a significant theoretical advance in optimization for machine learning. Their paper, 'Mirror Descent on Riemannian Manifolds,' successfully generalizes the powerful and scalable Mirror Descent (MD) algorithm—a cornerstone of large-scale optimization in AI—to work on Riemannian manifolds. These are curved, high-dimensional geometric spaces that naturally describe complex parameter constraints, such as requiring weight matrices to be orthogonal (the Stiefel manifold). The authors develop a full Riemannian Mirror Descent (RMD) framework through reparameterization and also propose a stochastic variant for handling noisy data, complete with rigorous non-asymptotic convergence guarantees.
This work is not just theoretical; it has direct, practical implications for AI model training. A major application shows that on the Stiefel manifold, the new RMD framework simplifies to the established Curvilinear Gradient Descent (CGD) method, providing a unified theoretical foundation. More importantly, the stochastic extension of RMD yields a novel stochastic CGD algorithm. This directly addresses the challenge of large-scale optimization on manifolds, which is critical for modern AI tasks like training certain types of neural networks (e.g., recurrent networks with orthogonal weights), advanced policy optimization in reinforcement learning, and sophisticated image processing techniques where parameters have inherent geometric structure.
- Generalizes Mirror Descent to Riemannian manifolds, enabling optimization on curved spaces like the Stiefel manifold.
- Provides both deterministic (RMD) and stochastic variants with formal non-asymptotic convergence guarantees.
- Shows practical reduction to Curvilinear Gradient Descent (CGD), enabling large-scale optimization for AI training with constraints.
Why It Matters
Enables more efficient and theoretically sound training of AI models with geometric constraints, like orthogonal neural networks.