Random Forests as Statistical Procedures: Design, Variance, and Dependence
A new statistical framework proves you can't eliminate prediction error just by adding more trees.
A new paper reframes random forests as explicit finite-sample statistical designs rather than just algorithms. It provides an exact variance decomposition, showing predictive variability stems from two key design mechanisms: reuse of training observations and alignment of data-adaptive partitions. Crucially, this creates a strict covariance floor, meaning predictive variability cannot be eliminated by simply increasing the number of trees, challenging a common optimization practice.
Why It Matters
This changes how data scientists optimize random forests, showing unlimited trees won't fix inherent prediction variance.