Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions
A new framework breaks down model uncertainty by class, revealing which predictions could be dangerously wrong.
Researchers Mame Diarra Toure and David A. Stephens have published a paper introducing a novel method to decompose the epistemic uncertainty of AI models into per-class contributions. Current Bayesian deep learning summarizes a model's ignorance with a single scalar metric called mutual information (MI), which fails to distinguish whether uncertainty involves a benign or a safety-critical class—a critical flaw for applications like medical diagnosis. Their new framework, detailed in the arXiv preprint 'Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions,' addresses this by breaking down MI into a vector C_k(x) for each possible class, derived from the mean and variance of the model's predicted probabilities across posterior samples. This decomposition, based on a second-order Taylor expansion, corrects for boundary suppression and allows for fair comparison across rare and common classes.
The practical impact is significant, especially in safety-critical fields. Validation on a diabetic retinopathy diagnosis task showed that using the critical-class component of C_k reduced 'selective prediction risk'—the cost of errors when the model abstains—by 34.7% compared to using standard MI and by 56.2% compared to simple variance baselines. The method also improved out-of-distribution detection and proved more robust to label noise in controlled studies. The research underscores that for reliable AI, *how* uncertainty is propagated through a model's architecture is as important as the metric used to measure it, pushing the field toward more interpretable and trustworthy uncertainty quantification for real-world deployment.
- Decomposes mutual information (MI) into a per-class vector C_k, revealing *which* classes a model is uncertain about, not just how uncertain it is.
- Validated on medical AI: Reduced selective prediction risk in diabetic retinopathy diagnosis by 34.7% over standard MI and 56.2% over variance baselines.
- Includes a diagnostic for approximation quality and shows the method is less sensitive to aleatoric (data) noise than MI under Bayesian training.
Why It Matters
Enables safer AI in medicine and autonomous systems by pinpointing dangerous uncertainties, not just quantifying their amount.