No feature ranking can be faithful, stable, and complete under collinearity; rankings flip up to 50% of the time for collinear pairs?

No feature ranking can be faithful, stable, and complete under collinearity; rankings flip up to 50% of the time for collinear pairs.

DASH (Diversified Aggregation of SHAP) is a provably Pareto-optimal ensemble method that achieves the Cramér-Rao variance bound?

DASH (Diversified Aggregation of SHAP) is a provably Pareto-optimal ensemble method that achieves the Cramér-Rao variance bound.

68% of 77 surveyed public datasets exhibit attribution instability; the findings are formally verified with 305 Lean 4 theorems?

68% of 77 surveyed public datasets exhibit attribution instability; the findings are formally verified with 305 Lean 4 theorems.

Research & Papers

New paper proves no AI feature ranking is reliable when data is correlated

arXiv cs.LG May 23, 2026

⚡68% of datasets show instability; DASH method offers a mathematically optimal fix.

Deep Dive

Researchers from an anonymous team (Caraker, Arnold, Rhoads) have released a landmark paper proving that no feature attribution ranking can simultaneously satisfy three desirable properties—faithfulness, stability, and completeness—when features are collinear. For collinear pairs, ranking effectively reduces to a coin flip. The proof is quantitative: the attribution ratio diverges as 1/(1-rho^2) for gradient boosting, is infinite for Lasso, and converges for random forests. The authors characterize the entire design space: only two families of methods exist—faithful-complete methods that are unstable (rankings flip up to 50% of the time) and ensemble methods like DASH that are stable and report ties for symmetric features.

To mitigate the problem, the team introduces DASH (Diversified Aggregation of SHAP), a Pareto-optimal ensemble method that achieves the Cramér-Rao variance bound with a tight ensemble size formula. In a survey of 77 public datasets, 68% exhibited attribution instability. The framework includes practical diagnostics—a Z-test workflow and single-model screening tool—and has direct consequences for fairness auditing: SHAP-based proxy discrimination audits are provably unreliable under collinearity. The entire impossibility theorem, design space theorem, and diagnostics are mechanically verified in Lean 4 (305 theorems from 16 axioms, 0 sorry)—the first formally verified impossibility in explainable AI.

Key Points

No feature ranking can be faithful, stable, and complete under collinearity; rankings flip up to 50% of the time for collinear pairs.
DASH (Diversified Aggregation of SHAP) is a provably Pareto-optimal ensemble method that achieves the Cramér-Rao variance bound.
68% of 77 surveyed public datasets exhibit attribution instability; the findings are formally verified with 305 Lean 4 theorems.

Why It Matters

This proves that popular SHAP-based fairness audits are unreliable—a direct challenge to current AI governance practices.

Read Original Article

New paper proves no AI feature ranking is reliable when data is correlated

Why It Matters

Related Articles

🚀 Stay Ahead in AI