MIT/Stanford research proves transformers can learn universal Bayesian priors
New paper shows pretrained transformers achieve near-optimal O(1/n) regret on statistical problems.
Researchers from MIT and Stanford published "Universal Priors," proving transformers pretrained on synthetic data can solve empirical Bayes problems. The theoretical work shows these models achieve a near-optimal regret bound of OĢ(1/n) across all test distributions. This explains why models like those from Teh et al. (2025) generalize beyond their training data, performing Bayesian inference through posterior contraction to adapt to new statistical tasks.
Why It Matters
Provides a mathematical foundation for why LLMs generalize, guiding development of more robust and statistically sound AI agents.