Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective
A new study applies probabilistic machine learning to predict the Collatz sequence's behavior for numbers up to 10 million.
Researchers Nicolò Bonacorsi and Matteo Bordoni have published a novel paper, 'Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective,' applying advanced statistical techniques to one of mathematics' most famous unsolved problems. The study analyzes the total stopping time—the number of steps to reach 1—for integers up to 10 million, treating it as a skewed and overdispersed count variable. The core announcement is the development of two complementary probabilistic models to predict and understand this erratic behavior, moving beyond pure number theory into data-driven inference.
The first model is a Bayesian hierarchical Negative Binomial regression (NB2-GLM) that predicts stopping time using simple covariates like log(n) and n mod 8, providing full posterior uncertainty. The second is a mechanistic generative model based on randomizing the 'odd-block' lengths in the Collatz sequence, calibrated with a Dirichlet-multinomial update. A key finding is that the regression model achieved substantially higher predictive likelihood on held-out data. Crucially, conditioning the block-length distribution on the residue class modulo 8 significantly improved the generator's fit, revealing that low-order modular arithmetic is a primary driver of the stopping time's notorious heterogeneity. This work demonstrates how modern probabilistic ML can shed new light on classic, hard computational problems.
- Applied Bayesian ML to Collatz stopping times for n ≤ 10^7, treating it as a skewed count variable.
- Developed two models: a Bayesian Negative Binomial regression and a mechanistic odd-block generator.
- Found that conditioning on n mod 8 is a key driver of heterogeneity, with the regression model outperforming the generator.
Why It Matters
Demonstrates how probabilistic ML can provide new, data-driven insights into famously intractable mathematical problems.