Gradient Regularized Newton Boosting Trees with Global Convergence
GBDTs like XGBoost may diverge; new method guarantees convergence with O(1/k²) rate.
Gradient Boosting Decision Trees (GBDTs) underpin tabular machine learning in production systems like XGBoost, LightGBM, and CatBoost, which rely on Newton boosting — a second-order descent step in tree space. Despite empirical success, the global convergence of Newton boosting has remained poorly understood compared to first-order methods. A new paper by Zozoulenko et al. (arXiv:2605.00581) bridges this gap by introducing Restricted Newton Descent, a framework that studies convex optimization with Newton's method on Hilbert spaces using inexact iterates, based on cosine angle and weak gradient edge concepts. Within this framework, both Newton boosting with GBDTs and classical finite-dimensional theory emerge as special cases.
For smooth, strongly convex losses satisfying a Hessian-dominance condition, the authors prove vanilla Newton boosting achieves a linear rate of convergence. To handle general convex losses with Lipschitz Hessians, they extend a recent gradient regularized Newton scheme to the restricted weak learner setting. This scheme minimally modifies classical Newton boosting by adding an adaptive ℓ2-regularization term proportional to the square root of the gradient norm at each iteration. The result is a solid O(1/k²) convergence rate — matching that of first-order boosting with Nesterov momentum — making it the first globally convergent second-order GBDT algorithm with guaranteed performance. Numerical experiments demonstrate convergence where vanilla Newton boosting may diverge.
- Vanilla Newton boosting achieves linear convergence for smooth, strongly convex losses with Hessian-dominance.
- New adaptive ℓ2-regularization (scaling with sqrt gradient norm) ensures O(1/k²) rate for general convex losses.
- Matches convergence of first-order boosting with Nesterov momentum while preventing divergence in practice.
Why It Matters
Ensures stability and convergence guarantees for GBDTs, critical for production ML systems using XGBoost/LightGBM.