Bifurcations collapse neural tangent kernel to rank-one, simplifying RNN training
New theory reveals training dynamics simplify drastically near critical transitions in RNNs.
A new paper from James Hazelden and Eric Shea-Brown (UW) develops a local theory of gradient descent near bifurcations — qualitative changes in recurrent network dynamics. They introduce the empirical state-space neural tangent kernel (sNTK) and show that as a network approaches a bifurcation, the sNTK collapses to a rank-one operator. This collapse dominates the learning landscape: gradient descent is funneled into a few critical dynamical directions, making the loss geometry predictable from classical normal form theory. In a student-teacher RNN experiment, the first learned bifurcation coincided with a sharp drop in sNTK effective rank, and the dominant parameter direction matched the scalar pitchfork normal form.
The authors also demonstrate that low-rank natural gradient methods can resolve the learning instability that arises near bifurcations with very little computational overhead over standard SGD. This provides a principled way to stabilize training of recurrent models that must learn to pass through phase transitions (e.g., in time-series forecasting, motor control, or neural dynamics). The work bridges dynamical systems and deep learning theory, offering a tractable mathematical framework for understanding feature learning in time-dependent tasks.
- Near codimension-1 bifurcations, the state-space NTK reduces to a rank-one operator, dramatically simplifying training dynamics.
- In student-teacher RNN experiments, the effective rank of sNTK collapsed as the network learned its first bifurcation.
- Low-rank natural gradient methods stabilize training near bifurcations with minimal overhead vs. SGD.
Why It Matters
Provides a principled theory for training RNNs through critical phase transitions, improving stability and interpretability.