α-TCAV unifies CAV explainability with stable, smooth concept scores
New framework fixes statistical instability in AI concept explanations by replacing flawed indicator functions.
Concept Activation Vectors (CAVs) are widely used to interpret deep learning models by measuring how much a concept (e.g., 'stripes') influences predictions. However, the standard Testing with CAVs (TCAV) method suffers from statistical instability. In a new paper, Ekkehard Schnoor and colleagues analyze the stochastic nature of CAVs and identify a fundamental flaw: the TCAV score uses a discontinuous indicator function that introduces non-decaying variance, making results unreliable even with increased sampling. The authors prove this flaw affects major CAV variants like PatternCAV, FastCAV, and ridge regression-based CAVs, and show that current state-of-the-art choices lack theoretical justification.
To solve this, the team introduces α-TCAV, a generalized framework that replaces the indicator with a smooth, parameterized function. This yields a unified probabilistic formulation that subsumes both TCAV and Multi-TCAV. The parameter α can be tuned either to imitate Multi-TCAV at substantially lower computational cost or to obtain a calibrated Bayes-optimal probabilistic measure of a concept's influence. The paper provides principled guidance on setting α and delivers a surprising practical recommendation: for maximum reliability, allocate the full sampling budget to a single CAV rather than splitting it across several. This work promises more robust and theoretically sound explainability for critical AI applications.
- Identifies a fundamental flaw in TCAV: a discontinuous indicator function causes non-decaying variance.
- Introduces α-TCAV with a smooth parameterized function for stable, probabilistic concept influence scores.
- Recommends allocating full sampling budget to one CAV instead of splitting, challenging established practice.
Why It Matters
More reliable AI explainability for critical applications like healthcare and finance, without added computational cost.