Decomposing Observational Multiplicity in Decision Trees: Leaf and Structural Regret
Research finds structural regret drives 15x more variability than leaf-level noise in credit scoring models.
A new research paper by Mustafa Cavus tackles a core problem in machine learning reliability: predictive multiplicity. This phenomenon occurs when multiple models perform equally well on the same data, creating uncertainty about which one to trust. The paper, 'Decomposing Observational Multiplicity in Decision Trees,' focuses on a key source called observational multiplicity, which stems from the inherent noise in collected training labels. While previous work established theory for models like logistic regression, this research provides the first rigorous framework for non-smooth, widely-used decision tree classifiers.
The core innovation is the decomposition of observational multiplicity into two measurable components: leaf regret and structural regret. Leaf regret quantifies the prediction variability caused by finite-sample noise within a single, fixed leaf of a decision tree. Structural regret captures the more significant instability introduced by the tree's learned branching structure itself, which can change dramatically with different data samples. Experiments on credit risk scoring datasets revealed that structural regret is the dominant factor, responsible for over 15 times more variability than leaf regret in some cases.
This decomposition isn't just theoretical; it has practical applications for improving AI safety. The authors demonstrate that using these regret measures as an abstention mechanism allows a model to identify uncertain predictions and refuse to answer. This selective prediction approach significantly boosted performance on the most reliable data subsets, elevating recall from 92% to a perfect 100%. The work provides data scientists with new tools to audit, interpret, and build more trustworthy tree-based models, aligning with critical advances in algorithmic transparency and safety.
- Introduces 'leaf regret' and 'structural regret' to quantify two sources of uncertainty in decision trees.
- Finds structural regret is the primary driver, causing over 15x more variability than leaf regret in credit datasets.
- Shows using these metrics for selective prediction can improve model safety, raising recall to 100% on stable data.
Why It Matters
Provides a formal framework to measure and improve the reliability of widely-used decision tree models in high-stakes applications like finance.