Research & Papers

How important are the genes to explain the outcome - the asymmetric Shapley value as an honest importance metric for high-dimensional features

New method tackles collinearity in clinical AI, offering a more honest metric for high-dimensional genomics.

Deep Dive

A team of researchers including Mark van de Wiel and Kjersti Aas has published a paper introducing the 'asymmetric Shapley value' as a novel, more honest metric for quantifying feature importance in high-dimensional clinical prediction models. The work directly addresses a critical flaw in current practice, where the importance of complex features like genomics is often assessed simply by the performance boost they provide when added to a baseline model. This common approach fails to account for collinearity and the known direction of dependencies between variables, such as disease state mediating genomic effects, leading to potentially misleading interpretations. The new framework is specifically designed for the mixed-dimensional setting common in modern medicine, where traditional clinical variables meet high-throughput genomic data.

The paper details efficient algorithms for computing both local and global asymmetric Shapley values, enabling robust inference and clear interpretation by decomposing any predictive performance metric into contributions from individual features. The researchers illustrate their method using a leading clinical example: predicting progression-free survival for colorectal cancer patients. This provides a concrete use case where understanding the true contribution of genomic markers, amidst a web of clinical confounders, is paramount for advancing personalized medicine. The development represents a significant step toward more transparent and trustworthy AI in healthcare, where understanding 'why' a model makes a prediction is as important as the prediction itself.

Key Points
  • Proposes 'asymmetric Shapley values' to fix flawed feature importance metrics in clinical AI, accounting for collinearity.
  • Provides efficient algorithms for local/global interpretation, decomposing model performance into honest feature contributions.
  • Demonstrated on a real-world task: predicting progression-free survival in colorectal cancer using genomic and clinical data.

Why It Matters

Enables more trustworthy and interpretable AI for precision medicine, crucial for clinical adoption and understanding disease drivers.