Research & Papers

A Compression Perspective on Simplicity Bias

A new paper frames neural network learning as an optimal compression problem, predicting feature transitions.

Deep Dive

A team of eight researchers, including Tom Marty, Eric Elmoznino, and Guillaume Lajoie, has published a new theoretical paper titled 'A Compression Perspective on Simplicity Bias' on arXiv. The work tackles the well-documented but poorly understood 'simplicity bias' in deep neural networks—their tendency to learn simple, often spurious features before complex ones. The researchers recast the entire problem of supervised learning through the lens of the Minimum Description Length (MDL) principle, framing it as a task of optimal two-part lossless compression. This elegant formulation creates a fundamental trade-off: the cost of describing the model's hypothesis versus the cost of describing the training data given that hypothesis.

This compression-based theory yields powerful, testable predictions about how neural networks select features. It posits that learners will only transition from simple 'shortcut' features to more complex, generalizable ones when the reduction in data encoding cost justifies the increased complexity of the model itself. Consequently, the framework identifies distinct data regimes. In one regime, adding more training data promotes model robustness by making simple spurious correlations too expensive to encode, forcing the network to learn better features. Conversely, in another regime, limiting data can act as a novel form of 'complexity-based regularization,' preventing the model from learning unreliable, complex environmental cues that don't generalize. The team validated their theory on a semi-synthetic benchmark, demonstrating that neural networks follow the same solution trajectory as optimal two-part compressors.

Key Points
  • Frames neural network learning as an optimal two-part compression problem using Minimum Description Length (MDL) principle.
  • Predicts that models transition from simple to complex features only when the data cost reduction outweighs the model complexity cost.
  • Identifies specific data regimes: one where more data increases robustness, and another where less data acts as complexity regularization.

Why It Matters

Provides a theoretical foundation for dataset design and regularization strategies, helping practitioners build more robust and generalizable AI models.