The other paper that killed deep learning theory
How a 2019 paper shattered attempts to explain neural network generalization...
The 2016 Zhang et al. paper sent shockwaves through deep learning theory by demonstrating that standard neural networks could perfectly memorize random labels, disproving the prevailing statistical learning theory paradigm that relied on hypothesis class complexity. This forced researchers to seek data-dependent generalization bounds that could explain why networks generalize on real data despite having the capacity to overfit. Key attempts included Bartlett, Foster, and Telgarsky's 2017 work on spectrally-normalized margin bounds and Neyshabur et al.'s PAC-Bayesian approach, both aiming to link properties like weight norm and margin to generalization error.
However, Nagarajan and Kolter's 2019 paper 'Uniform convergence may be unable to explain generalization in deep learning' systematically showed that these data-dependent bounds still fail to capture the generalization behavior of neural networks. They proved that even with spectral complexity and margin measures, the bounds are too loose to be meaningful in practice. This effectively killed the line of research attempting to use uniform convergence bounds to explain deep learning's success, leaving the field without a rigorous theoretical foundation and highlighting a fundamental gap between theory and practice in modern machine learning.
- 2016 Zhang et al. showed neural networks can memorize random labels, challenging statistical learning theory
- Post-2016 attempts used data-dependent bounds like spectral norm and margin to explain generalization
- Nagarajan and Kolter's 2019 paper proved even these data-dependent bounds cannot explain deep learning generalization
Why It Matters
Deep learning theory lacks a rigorous explanation for why overparameterized networks generalize, impacting model reliability and future AI research.