On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking
New paper explains how AI models learn modular math, revealing the three-stage 'grokking' process.
Researchers Jianliang He, Leda Wang, Siyu Chen, and Zhuoran Yang published a paper analyzing how two-layer neural networks learn modular addition. They provide a full mechanistic interpretation, proving networks learn single-frequency Fourier features and phase alignment. Their analysis formalizes a 'diversification condition' and explains feature emergence via a lottery ticket mechanism. The work demystifies grokking as a three-stage process involving memorization followed by two generalization phases driven by loss-weight decay competition.
Why It Matters
Provides fundamental insights into how neural networks learn, potentially leading to more efficient and interpretable AI models.