Research & Papers

The Bernstein-von Mises theorem for Bayesian one-pass online learning

No more need for large mini-batches in streaming Bayesian inference...

Deep Dive

Bayesian online learning offers a principled way to update beliefs sequentially, but existing theoretical guarantees often require mini-batch sizes to grow without bound—a condition that fails in the critical one-pass (streaming) regime where each data point is seen only once. Jeyong Lee, Junhyeok Choi, Dongguen Kim, and Minwoo Chae introduce a new algorithm designed specifically for this setting. Their method adds a warm-start phase that stabilizes sequential posterior updates, allowing the algorithm to provably achieve the optimal convergence rate. More importantly, they establish an online analogue of the celebrated Bernstein-von Mises theorem, which guarantees that the sequentially updated posterior provides valid uncertainty quantification without requiring diverging mini-batch sizes.

The theoretical framework deviates fundamentally from prior work in online learning, relying on a novel analysis that addresses the challenges of single-pass data processing. Numerical experiments on generalized linear models demonstrate that the proposed method matches the performance of the full batch estimator—which has access to all data—while significantly outperforming existing online procedures. This work fills a key gap in the theory of Bayesian streaming inference, making it practical for applications like real-time recommendation systems, financial modeling, and sensor networks where data arrives sequentially and must be processed instantly. The 52-page paper (arXiv:2604.27442) also includes MSC classifications in statistics theory and machine learning.

Key Points
  • New Bayesian online learning algorithm with a warm-start phase for one-pass (streaming) settings.
  • Proven optimal convergence rate and online Bernstein-von Mises theorem for valid uncertainty quantification without diverging mini-batch sizes.
  • Experiments on generalized linear models match batch estimator accuracy while outperforming existing online methods.

Why It Matters

Enables real-time, uncertainty-aware Bayesian inference on streaming data, critical for high-frequency applications.