Research & Papers

ILDR: Geometric Early Detection of Grokking

New geometric measure predicts delayed generalization before it happens.

Deep Dive

A new paper from Shreel Golwala introduces ILDR (Inter/Intra-class Distance Ratio), a geometric metric that detects grokking—the delayed generalization phenomenon in neural networks—significantly earlier than existing approaches. Grokking occurs when a network achieves perfect training accuracy long before validation accuracy improves, followed by an abrupt transition to strong generalization. Current detection signals like weight norm or GrokFast's gradient EMA are indirect and unstable; GrokFast's standard deviation exceeds its mean lead time across seeds. ILDR addresses this by computing the ratio of inter-class centroid separation to intra-class scatter on second-to-last layer representations, grounded in Fisher's linear discriminant criterion. It requires no eigendecomposition and runs in O(|C|^2 + N) complexity.

Evaluated on modular arithmetic and permutation group composition (S5), ILDR leads the grokking transition by 9 to 73 percent of the training budget, with lead time increasing with task algebraic complexity. Over eight random seeds, ILDR leads by 950 ± 250 steps with a coefficient of variation of 26 percent, and post-grokking variance drops by 1696 times—consistent with a sharp phase transition in representation space. Using ILDR as an early stopping trigger reduces training by 18.6 percent on average. Optimizer interventions triggered at the ILDR threshold demonstrate bidirectional control over the transition, suggesting ILDR tracks representational conditions underlying generalization rather than a downstream correlate. This geometric approach offers a robust, efficient tool for understanding and accelerating neural network training dynamics.

Key Points
  • ILDR detects grokking up to 73% of training budget earlier than validation accuracy improves.
  • The metric leads transitions by 950 ± 250 steps with 26% coefficient of variation across 8 seeds.
  • Using ILDR as early stopping reduces training by 18.6% on average.

Why It Matters

ILDR offers a reliable, efficient way to detect and control grokking, potentially accelerating neural network training.