Wang et al. Prove First Risk Bounds for KANs with DP-SGD and Correlated Noise
New analysis shows correlated noise improves privacy-utility tradeoff for KANs, closing gap to practice.
A team of researchers led by Puyu Wang (TU Kaiserslautern) and including Jan Schuchardt, Nikita Kalinin, Junyu Zhou, Sophie Fellenz, Christoph Lampert, and Marius Kloft have published a groundbreaking theoretical analysis of Kolmogorov-Arnold Networks (KANs) trained with differentially private SGD (DP-SGD) using correlated noise. Their paper, posted on arXiv, provides the first population risk bounds for KANs under mini-batch SGD with gradient clipping, covering both non-private and DP-SGD settings. Unlike prior work that assumed full-batch gradient descent and independent noise, this new analysis tackles mini-batch SGD—the standard training recipe for modern networks—and temporally correlated noise, which empirically offers a more favorable privacy-utility tradeoff than independent noise. The results unify and sharpen earlier full-batch independent-noise bounds for KANs by the same group.
The technical core of the paper addresses the major challenges of analyzing correlated-noise DP training in the non-convex regime. Temporal dependence breaks the conditional-centering structure of standard one-step SGD arguments, and the projection step obstructs exact cancellation of correlated perturbations. The authors overcome these difficulties with an auxiliary unprojected dynamics, a shifted iterate that absorbs the current noise perturbation, and a high-probability bootstrap that certifies when the projection is inactive. Combining this optimization analysis with a stability-based generalization argument yields the stated population risk bounds. To the best of the authors' knowledge, this is the first optimization and population risk analysis of a correlated-noise mechanism for DP training in non-convex learning, opening the door to more practical differential privacy guarantees for neural networks.
- First population risk bounds for KANs trained with mini-batch SGD (standard practice), not just full-batch GD.
- Correlated Gaussian noise in DP-SGD provably improves privacy-utility tradeoff over independent noise for KANs.
- New mathematical techniques (auxiliary unprojected dynamics, shifted iterates) handle temporal dependence and projection obstructions.
Why It Matters
This brings DP theory closer to real-world KAN training, promising better privacy with less utility loss.