The Generalised Kernel Covariance Measure
New CI test ditches slow kernel ridge regression for flexible, tree-based models, boosting speed and accuracy.
A team of researchers including Luca Bergen has introduced the Generalised Kernel Covariance Measure (GKCM), a significant advancement in conditional independence (CI) testing for causal discovery. CI tests are crucial for determining if two variables are independent given a third, a foundational task for building causal models from data. Existing state-of-the-art kernel-based methods embed data into high-dimensional spaces but are bottlenecked by their reliance on kernel ridge regression, which is slow to tune and can produce poorly calibrated results if untuned. GKCM breaks this dependency, offering a framework that works with a broad class of regression estimators.
By building on the Generalised Hilbertian Covariance Measure, the authors provide theoretical guarantees for GKCM's performance. In practical simulations, pairing GKCM with efficient, tree-based regression models (like those from Random Forests or Gradient Boosting) allowed it to frequently outperform other CI tests. It demonstrated superior control over Type I errors (false positives) and delivered competitive or better statistical power (true positive rate) across diverse data scenarios. This combination of flexibility, speed, and reliability addresses a major pain point, moving robust causal inference from a theoretical ideal to a more practical tool for data scientists and AI researchers working on explainable and trustworthy systems.
- GKCM is a regression-agnostic CI test that replaces slow kernel ridge regression, enabling the use of faster models like Random Forests.
- In simulations, it achieved better Type I error control and competitive power versus state-of-the-art methods across diverse data types.
- The work, accepted at CLeaR 2026, provides a practical tool for more efficient and reliable causal discovery in machine learning.
Why It Matters
It makes robust causal discovery—key for trustworthy AI and scientific modeling—significantly faster and more practical for real-world data.