Research & Papers

Inversion-Free Natural Gradient Descent on Riemannian Manifolds

New algorithm eliminates costly matrix inversions, enabling faster optimization on constrained parameter spaces.

Deep Dive

A team of researchers including Dario Draca, Takuo Matsubara, and Minh-Ngoc Tran has developed a novel optimization algorithm that extends natural gradient descent to Riemannian manifolds while eliminating the computational bottleneck of matrix inversion. Traditional natural gradient methods assume Euclidean parameter spaces, but many real-world optimization problems involve parameters with inherent constraints—like positive definiteness in covariance matrices or orthogonality in rotation matrices—that naturally live on curved manifolds. The Riemannian approach allows these constraints to be enforced implicitly while maintaining parameter identifiability and geodesic convexity properties.

Their inversion-free method builds on an intrinsic formulation of the Fisher information matrix (FIM) on manifolds, maintaining an online approximation of the inverse FIM that updates efficiently at quadratic cost using score vectors sampled at successive iterations. A key challenge in the Riemannian setting is that these score vectors belong to different tangent spaces and must be combined using parallel transport operations. The researchers prove almost-sure convergence rates of O(log s/s^α) for squared distance to the minimizer when step size exponent α > 2/3, and they also establish rates for the approximate FIM despite accumulated transport-based errors.

Practical implementations include a limited-memory variant with sub-quadratic storage complexity, making the method scalable to higher-dimensional problems. The paper demonstrates the algorithm's effectiveness on statistical learning tasks including variational Bayes with Gaussian approximations and normalizing flows, showing advantages over Euclidean counterparts. By avoiding explicit matrix inversions—traditionally an O(n³) operation—the method offers significant computational savings while maintaining the theoretical benefits of natural gradient descent on constrained parameter spaces.

Key Points
  • Eliminates expensive matrix inversions in natural gradient descent, reducing computational complexity
  • Proves convergence rates of O(log s/s^α) for squared distance to minimizer with α > 2/3
  • Demonstrates effectiveness on variational Bayes and normalizing flows compared to Euclidean methods

Why It Matters

Enables faster, more stable optimization for constrained parameter problems in machine learning and statistics.