Research & Papers

Shapley Value-Guided Adaptive Ensemble Learning for Explainable Financial Fraud Detection with U.S. Regulatory Compliance Validation

New AI ensemble uses SHAP values to dynamically weight predictions, achieving 0.9245 AUC-ROC while proving regulatory compliance.

Deep Dive

A new research paper introduces a significant advancement in the fight against financial fraud, which costs U.S. institutions over $32 billion annually. The work, by Mohammad Nasir Uddin and Md Munna Aziz, tackles the critical barrier preventing AI adoption in finance: the 'black box' nature of complex models that fail regulatory transparency mandates like OCC Bulletin 2011-12. The study first provides a rigorous benchmark, evaluating explanation quality across metrics like faithfulness and stability. It found XGBoost with TreeExplainer achieved near-perfect stability (Kendall's W=0.9912), while LSTM models performed poorly.

The core innovation is the SHAP-Guided Adaptive Ensemble (SGAE) algorithm. Unlike static ensembles, SGAE dynamically adjusts the weight given to different model predictions (like XGBoost, LSTM, or GNN) for each individual transaction. This adjustment is based on the agreement of their SHAP (Shapley Additive exPlanations) values, a game-theory approach to feature attribution. This method achieved the highest held-out AUC-ROC score of all tested models at 0.8837, and 0.9245 in cross-validation. The researchers also completed a full evaluation of three neural architectures—LSTM, Transformer, and GNN-GraphSAGE—on the massive IEEE-CIS dataset, with GraphSAGE achieving an AUC-ROC of 0.9248.

Crucially, the entire framework is designed with compliance in mind. All results and explanation methodologies are explicitly mapped to the requirements of U.S. regulations including OCC Bulletin 2011-12, Federal Reserve SR 11-7, and BSA/AML rules. This provides a clear, auditable trail from model prediction back to the contributing features, directly answering regulators' demands for explainable AI (XAI). The paper represents a practical blueprint for deploying high-performance, yet compliant, machine learning systems in heavily regulated financial environments.

Key Points
  • Introduced the SHAP-Guided Adaptive Ensemble (SGAE), which dynamically weights model predictions per transaction based on SHAP value agreement, achieving a top AUC-ROC of 0.9245.
  • Provided a complete three-architecture benchmark on 590,540 transactions, with GNN-GraphSAGE scoring AUC-ROC 0.9248 and F1=0.6013, outperforming LSTM and Transformer models.
  • Explicitly maps all model explanations and performance metrics to U.S. regulatory compliance (OCC, Fed SR 11-7), solving the critical 'black box' auditability problem for finance.

Why It Matters

Enables banks to deploy more accurate AI fraud detection while providing the transparent, auditable explanations required by U.S. financial regulators.