Research & Papers

New spectral distillation shrinks tree ensembles by orders of magnitude

Random forests can now be compressed while preserving accuracy — a 99% size reduction.

Deep Dive

Researchers Binh Duc Vu and David Watson at arXiv published a paper analyzing tree ensembles (random forests and gradient boosting machines) from a spectral perspective. Their first contribution derives minimax-optimal convergence rates for random forest regression under mild regularity assumptions on tree growth, showing that the eigenvalue decay of the induced kernel operator governs statistical performance. This provides a long-sought theoretical grounding for why random forests work so well.

Their second contribution leverages the same spectral view to compress tree ensembles. For RFs, they use leading eigenfunctions of the kernel operator; for GBMs, they use leading singular vectors of the smoother matrix. By learning nonlinear maps for these spectral representations, they create distilled models that are orders of magnitude smaller than the originals while maintaining competitive predictive accuracy. The method beats state-of-the-art forest pruning and rule extraction algorithms, with clear applications for edge devices and resource-constrained computing.

Key Points
  • Derived minimax-optimal convergence rates for random forest regression using eigenvalue decay of the kernel operator
  • Spectral distillation compresses RFs via eigenfunctions and GBMs via singular vectors, producing models orders of magnitude smaller
  • Outperforms state-of-the-art forest pruning and rule extraction methods on benchmark datasets

Why It Matters

This brings deployable, high-performance tree ensembles to edge devices and low-resource environments without sacrificing accuracy.