Spectral transfer identity s = αγ validated with ~2% median error across 93 layers, 5 architectures, 3 datasets?

Spectral transfer identity s = αγ validated with ~2% median error across 93 layers, 5 architectures, 3 datasets

New Spectral Newton optimizer outperforms AdamW on vision benchmarks where α≈2?

New Spectral Newton optimizer outperforms AdamW on vision benchmarks where α≈2

Research & Papers

New paper explains why neural network curvature varies by layer type

Q: Spectral Alignment Decomposition?

α = 2 + dlogΦ_k / dlogσ_k, explaining curvature variation across layer types

arXiv cs.LG June 03, 2026

⚡Spectral Alignment Decomposition reveals why convolutions have α≈2, transformers α≈1...

Deep Dive

A new preprint on arXiv (arXiv:2606.02596) by Anherutowa Calvo provides an exact decomposition of the curvature exponent α in neural network loss landscapes. The paper proves the Spectral Alignment Decomposition: α = 2 + dlogΦ_k / dlogσ_k, where Φ_k measures alignment between Kronecker factor eigenbases and gradient singular directions. This explains why α varies systematically — α≈2 for convolutions, ≈1 for transformer attention, and <1 for MLP up-projections. The decomposition implies a spectral transfer identity s = αγ linking the curvature exponent, effective gradient rank-decay γ, and Hessian decay exponent s. Empirically, fitting α and γ on independent data (HVPs vs. SVD) recovers s to ~2% median error across 93 layers, five architectures, and three datasets — with no free parameters.

As a proof of concept, the author derives the architecture-adaptive preconditioner T(σ;α) and shows that Spectral Newton — implementing T in the gradient singular basis — outperforms AdamW on vision benchmarks where α≈2. The paper also includes a zeta-function bound on participation ratio, showing curvature concentrates onto effectively one direction per layer. This work not only provides a theoretical understanding of why different layers have different curvature properties, but also offers a practical optimizer that leverages these insights for improved training efficiency.

Key Points

Spectral Alignment Decomposition: α = 2 + dlogΦ_k / dlogσ_k, explaining curvature variation across layer types
Spectral transfer identity s = αγ validated with ~2% median error across 93 layers, 5 architectures, 3 datasets
New Spectral Newton optimizer outperforms AdamW on vision benchmarks where α≈2

Why It Matters

Provides a theoretical foundation for layer-specific curvature, enabling architecture-adaptive optimizers that outperform standard methods like AdamW.

Read Original Article

New paper explains why neural network curvature varies by layer type

Why It Matters

Related Articles

🚀 Stay Ahead in AI