Research & Papers

New paper proves optimal Edgeworth expansions for finite-width neural nets

How well do finite-width networks approximate their infinite-width limits?

Deep Dive

A new paper by Lucia Celli (arXiv:2605.24072) tackles a fundamental question in deep learning theory: how do finite-width neural networks deviate from their infinite-width Gaussian process limit? The answer comes in the form of optimal non-asymptotic Edgeworth expansions—higher-order corrections that capture the cumulant structure of finite-width networks. For a network evaluated on a finite set of inputs, the law of the true output can be approximated by an Edgeworth expansion of arbitrary order 4m-1, with a total variation error bounded by n^{-m} (where n is related to width or sample size). Crucially, the author proves matching lower bounds, showing these rates are sharp.

The results rely on standard assumptions (invertible covariance matrix, polynomially bounded activation) and encompass any sequence of conditionally Gaussian vectors converging to a Gaussian with invertible covariance. As a concrete application, Celli quantifies the error in Bayesian posterior distributions when the prior is replaced by its Edgeworth expansion—a common practical shortcut. This work provides the first optimal non-asymptotic guarantees for multivariate Edgeworth expansions in the context of neural networks, closing a gap between theory and practice. For researchers and practitioners using Bayesian deep learning or Gaussian process approximations, these bounds offer rigorous error control and a path to more accurate finite-width corrections.

Key Points
  • Establishes optimal non-asymptotic total variation bounds of order n^{-m} for Edgeworth expansions of arbitrary order 4m-1
  • Applies to fully connected networks with Gaussian initialization and polynomially bounded activation functions
  • Provides rigorous error quantification for Bayesian posterior approximations using Edgeworth-expanded priors

Why It Matters

Rigorous error bounds for finite-width corrections enable safer Bayesian deep learning and tighter Gaussian process approximations.