Research & Papers

Neural network approximation theory survey covers 40 years of depth-width trade-offs and KANs

From classic universal approximation to Kolmogorov-Arnold Networks — a deep dive into neural expressivity.

Deep Dive

The paper, 'Approximation Theory for Neural Networks: Old and New' by Soumendu Sundar Mukherjee and Himasish Talukdar, provides a comprehensive survey of four decades of theoretical progress. It revisits classical universal approximation theorems—showing that under mild activation conditions, feedforward networks can approximate any continuous function on compact sets, Lp spaces, or Sobolev spaces. The survey then moves to quantitative bounds: how approximation error scales with network width, depth, and smoothness assumptions on target functions. This includes results showing that deeper architectures can achieve superior parameter efficiency for structured function classes, formalizing the practical intuition that adding depth often beats adding width.

Beyond standard feedforward nets, the survey highlights recent developments in Kolmogorov–Arnold Networks (KANs). Inspired by the Kolmogorov–Arnold representation theorem, KANs replace fixed activation functions on nodes with learnable functions on edges, potentially offering better interpretability and fewer parameters for certain tasks. The theoretical properties of KANs—such as their approximation rates and expressivity—are now attracting significant attention. The paper's 31 pages and 4 figures thus serve as a valuable reference for researchers looking to understand both the foundations and frontiers of neural network approximation theory.

Key Points
  • Classical universal approximation theorems guarantee neural nets can approximate broad function classes; this survey adds quantitative error bounds linked to network size.
  • Depth outperforms width for structured functions: deeper nets achieve superior parameter efficiency, a key architectural insight.
  • New section on Kolmogorov–Arnold Networks (KANs) reviews their emerging approximation-theoretic properties as an alternative to standard feedforward architectures.

Why It Matters

Core theoretical results guide practical architecture decisions in deep learning and shape the design of next-gen models like KANs.