Research & Papers

Random Forests as Statistical Procedures: Design, Variance, and Dependence

A new statistical framework proves you can't eliminate prediction error just by adding more trees.

Deep Dive

A new paper reframes random forests as explicit finite-sample statistical designs rather than just algorithms. It provides an exact variance decomposition, showing predictive variability stems from two key design mechanisms: reuse of training observations and alignment of data-adaptive partitions. Crucially, this creates a strict covariance floor, meaning predictive variability cannot be eliminated by simply increasing the number of trees, challenging a common optimization practice.

Why It Matters

This changes how data scientists optimize random forests, showing unlimited trees won't fix inherent prediction variance.