Research & Papers

Deep ReLU networks beat curse of dimensionality with new approximation rates

New rates for anisotropic Besov functions achieve O((WL)^{-2\tilde{s}}) without dimension penalty

Deep Dive

A new paper by Yunfei Yang and Jun Fan (arXiv:2605.31152) tackles a fundamental challenge in deep learning theory: how efficiently deep ReLU networks can approximate and learn high-dimensional functions with varying smoothness across dimensions. Standard approximation rates suffer from the curse of dimensionality, where error scales poorly with input dimension d. The authors extend previous results for isotropic Besov spaces to anisotropic and mixed smooth function classes, proving that deep ReLU networks achieve rates independent of d when the mean smoothness or mixed smoothness is sufficiently high.

For anisotropic Besov spaces, they establish a rate of O((WL)^{-2\tilde{s}}) where \tilde{s} is the harmonic mean of the directional smoothness parameters. This means networks with width W and depth L can approximate functions that are smoother in some directions than others without exponential penalties. For mixed smooth Besov spaces (functions with dominating mixed derivatives), they achieve O((WL)^{-2s}) up to logarithmic factors. As an application, they show these bounds lead to minimax optimal learning rates for a variety of smoothness classes, providing a theoretical foundation for why deep ReLU networks excel on real-world data that often has anisotropic smoothness (e.g., images, time series).

Key Points
  • Proved approximation rate O((WL)^{-2\tilde{s}}) for anisotropic Besov spaces, where \tilde{s} is harmonic mean of directional smoothness
  • Achieved rate O((WL)^{-2s}) for mixed smooth Besov spaces, overcoming curse of dimensionality
  • Deep ReLU networks can achieve minimax optimal rates (up to log factors) for broad classes of smooth functions

Why It Matters

Validates deep ReLU networks as theoretically optimal for high-dimensional data with uneven smoothness—critical for computer vision and scientific computing.