New Survey Unifies Generalization Bounds for Overparameterized Neural Nets
How overfit neural nets still generalize? This survey finally explains the math.
Deep neural networks often generalize remarkably well despite being heavily overparameterized—a contradiction to classical learning theory based on uniform convergence over fixed hypothesis spaces. This survey by Hubert Leroux, Jean Marcus, and Julien Roger organizes a line of recent work that recovers non-vacuous guarantees by restricting attention to the parameter space actually visited by the training algorithm. The paper is built around three key steps: extending PAC-Bayesian theory to random data-dependent hypothesis sets (arXiv:2404.17442); refining the complexity term using geometric and topological descriptors of the optimization trajectory—such as fractal dimensions, alpha-weighted lifetime sums, and positive magnitude (arXiv:2006.09313, arXiv:2302.02766, arXiv:2407.08723); and replacing information-theoretic terms with stability assumptions (arXiv:2507.06775).
The survey unifies these contributions under a single template inequality and provides a head-to-head comparison of the resulting bounds. It spans 15 pages with 4 figures and 3 tables, and is written in JMLR preprint style. By synthesizing disparate theoretical advances—from PAC-Bayes to fractal geometry to algorithmic stability—the authors offer a coherent framework for understanding why overparameterized models generalize. This matters for AI researchers and engineers looking to design better training algorithms, as it bridges the gap between empirical success and theoretical understanding. The paper is available on arXiv (2605.13913) and cites five central references that form the backbone of this emerging theory.
- Extends PAC-Bayesian theory to handle random, data-dependent hypothesis sets (arXiv:2404.17442).
- Refines complexity using fractal dimensions and alpha-weighted lifetime sums from optimization trajectories (arXiv:2006.09313, arXiv:2302.02766).
- Unifies stability-based and information-theoretic bounds into a single template inequality for comparison.
Why It Matters
Provides theoretical foundations for why overparameterized models generalize, guiding better training methods and architectures.