Bayesian KANs achieve near-minimax rates in anisotropic Besov spaces
New paper proves KANs match optimal statistical rates while fixing network depth
A new theoretical paper by Jeunghun Oh, Kyeongwon Lee, Jaeyong Lee, and Lizhen Lin establishes rigorous statistical foundations for Bayesian Kolmogorov-Arnold Networks (KANs). The authors study posterior contraction rates in anisotropic Besov spaces—a function class that captures varying smoothness along different dimensions. They show that sparse Bayesian KANs equipped with spike-and-slab priors achieve the near-minimax optimal contraction rate, meaning the posterior distribution concentrates around the true function as fast as theoretically possible. A key result: unlike deep ReLU networks, the KAN depth can remain fixed because the learnable spline edge functions take over the expressivity burden. Instead, complexity is managed through network width, spline-grid range, and parameter sparsity.
Crucially, the paper develops tailored approximation and complexity bounds for sparse spline-edge architectures, then extends the analysis to compositional Besov spaces. In that setting, contraction rates depend on layerwise smoothness and effective dimension, proving that KANs can break the curse of dimensionality for compositional structures. This work provides the first Bayesian perspective on KANs—previously studied mainly via approximation theory or non-Bayesian optimization—and offers a principled prior framework for uncertainty quantification in neural networks. The theoretical tools developed here could guide future practical Bayesian inference with KANs on high-dimensional, heterogeneous data.
- Sparse Bayesian KANs with spike-and-slab priors achieve near-minimax posterior contraction in anisotropic Besov spaces
- KAN depth remains fixed; expressivity is controlled via width, spline-grid parameters, and sparsity—unlike standard MLPs
- Extended to compositional spaces, contraction rates depend on layerwise smoothness and effective dimension, avoiding curse of dimensionality
Why It Matters
First Bayesian theory for KANs shows they can match optimal rates while staying shallow—key for uncertainty-aware deep learning on high-dim data.