Research & Papers

Occam's Razor is Only as Sharp as Your ELBO

Even the ELBO, a key tool for model selection, can overfit under certain conditions.

Deep Dive

A new paper by Ethan Harvey and Michael C. Hughes challenges the conventional wisdom that the evidence lower bound (ELBO) from variational inference always promotes model simplicity via Occam's razor. The authors demonstrate that in a simple over-parameterized regression model, ELBO-based hyperparameter learning can actually produce overfitting, depending on the assumed rank of the covariance matrix in a Gaussian approximate posterior. This contradicts prior work that only highlighted underfitting from mean-field approximations. Surprisingly, the marginal likelihood (evidence) itself sometimes prefers the overfit version over the underfit one, while the ELBO does not, revealing a nuanced trade-off.

For Bayesian practitioners scaling to large models, the findings are a cautionary tale: reduced-rank assumptions needed for tractability can negatively impact model selection. The paper, submitted to arXiv under stat.ML and cs.LG, suggests that the ELBO's ability to embody Occam's razor is limited by the quality of the approximate posterior. As AI models grow, this work underscores the need for careful validation of variational methods beyond their computational convenience.

Key Points
  • ELBO-based hyperparameter learning can overfit in over-parameterized regression, depending on the covariance matrix rank.
  • Bayesian model selection via the evidence sometimes prefers overfit models over underfit ones, unlike the ELBO.
  • Reduced-rank approximations, common for tractability, may impair model selection in large-scale Bayesian models.

Why It Matters

This challenges assumptions about variational inference, urging caution in model selection for large-scale AI systems.