The other paper that killed deep learning theory
Nagarajan and Kolter's work shatters data-dependent complexity measures for neural networks.
In 2016, Zhang et al. dropped a bombshell by showing that standard neural networks could memorize random labels, undermining the core premise of statistical learning theory that complexity measures (like VC dimension) explain generalization. Researchers scrambled to develop data-dependent bounds—like Bartlett, Foster, and Telgarsky's 2017 spectral complexity bound, which used spectral norm and margin to bound generalization error. Neyshabur et al. (2018) proposed a PAC-Bayesian approach combining spectral norm with distance from initialization. These bounds aimed to quantify complexity based on specific trained networks, not just the hypothesis class.
Then came Nagarajan and Kolter's 2019 paper, 'Uniform convergence may be unable to explain generalization in deep learning.' They showed these data-dependent bounds scale poorly with network depth—the very feature that makes deep learning powerful. For deeper networks, the bounds become vacuous, failing to predict the low test error observed in practice. This suggested that uniform convergence (a key tool in learning theory) might be fundamentally incapable of explaining deep learning generalization. The paper effectively closed the door on a decade of research trying to adapt classical theory to neural networks, leaving the field searching for entirely new frameworks.
- Zhang et al. (2016) showed neural nets can memorize random labels, challenging statistical learning theory.
- Post-2016, researchers proposed data-dependent bounds like spectral complexity and margin-based measures.
- Nagarajan and Kolter (2019) proved these bounds scale poorly with depth, making them vacuous for deep networks.
Why It Matters
This reshapes AI theory: classical generalization bounds may be irrelevant for deep learning, demanding new frameworks.