Causal Direction from Convergence Time: Faster Training in the True Causal Direction
A new principle uses optimization dynamics to find causality, achieving 26/30 correct identifications on synthetic benchmarks.
Researcher Abdulrahman Tamim has introduced a novel principle for causal discovery called Causal Computational Asymmetry (CCA), detailed in the paper 'Causal Direction from Convergence Time: Faster Training in the True Causal Direction.' The core idea is elegantly simple: train one neural network to predict Y from X and another to predict X from Y; the direction in which the model converges faster during optimization is inferred to be the true causal direction. This method operates in 'optimization-time space,' distinguishing it from existing techniques like RESIT or IGCI that rely on statistical independence or distributional asymmetries in the data. The theoretical foundation establishes that under an additive noise model, the reverse (non-causal) direction suffers from irreducible statistical dependencies, creating a higher loss floor and noisier gradients, which strictly slows its convergence.
The paper provides formal proofs for this asymmetry and embeds CCA into a broader framework termed Causal Compression Learning (CCL). Empirically, the method demonstrated strong performance on synthetic benchmarks, correctly identifying the causal direction in 26 out of 30 tests across six different neural network architectures, including a perfect 30/30 score on data generated by sine and exponential processes. For the technique to work validly, proper z-scoring of variables is required to allow fair comparison of convergence rates. This research opens a new pathway for causal inference by leveraging the intrinsic dynamics of gradient-based optimization, potentially leading to more efficient and scalable tools for uncovering causal structures from observational data.
- Proposes Causal Computational Asymmetry (CCA): the causal direction is the one where a neural network trains faster.
- Achieved 26/30 correct causal identifications on synthetic benchmarks, with perfect scores on sine/exponential data.
- Theoretical proof shows reverse models have irreducible statistical dependencies, creating noisier gradients and slower convergence.
Why It Matters
Offers a new, optimization-based tool for causal discovery, a fundamental challenge in AI and data science with applications from healthcare to economics.