Uniform-in-time concentration in two-layer neural networks via transportation inequalities
New mathematical proof shows SGD-trained networks won't diverge unpredictably, enabling more reliable AI.
A team of researchers from LMBP (Arnaud Guillin, Boris Nectoux, and Paul Stos) has published a significant mathematical proof addressing a fundamental concern in neural network training. Their paper, 'Uniform-in-time concentration in two-layer neural networks via transportation inequalities,' demonstrates that neural networks trained via Stochastic Gradient Descent (SGD) maintain predictable behavior and stay close to their theoretical mean-field limit throughout the entire training process, not just at convergence. This work provides the first uniform-in-time bounds on the discrepancy between the actual SGD trajectory and its mean-field approximation for two-layer networks with quadratic loss and ridge regularization, offering mathematical guarantees that were previously elusive.
The technical breakthrough centers on establishing T_p transportation inequalities (for p ∈ {1,2}) for the law of SGD parameters, with explicit constants that don't depend on the iteration index. This allows the researchers to prove that the empirical parameter measure concentrates around its mean-field limit in the Wasserstein-1 (W₁) distance uniformly over time. They translate these mathematical bounds into practical prediction-error estimates against fixed test functions. Furthermore, by deriving analogous bounds in the sliced-Wasserstein distance (SW₁), they achieve dimension-free convergence rates, making the results applicable to high-dimensional problems common in modern AI. This foundational work provides theoretical underpinning for the observed stability of SGD in practice and could inform the development of more robust training algorithms.
- Proves uniform-in-time concentration bounds for SGD-trained two-layer neural networks around their mean-field limit
- Establishes T₁ and T₂ transportation inequalities with iteration-independent constants for the first time
- Achieves dimension-free rates using sliced-Wasserstein distance, making results applicable to high-dimensional AI models
Why It Matters
Provides mathematical guarantees that SGD-trained AI models remain stable and predictable, reducing deployment risks.