Research & Papers

[P] Using SHAP to explain Unsupervised Anomaly Detection on PCA-anonymized data (Credit Card Fraud). Is this a valid approach for a thesis?

r/MachineLearning March 16, 2026

⚡A BSc student questions if explaining PCA-transformed features with SHAP is valid for an XAI thesis.

Deep Dive

A computer science student's Bachelor's thesis is generating significant discussion on the practical limits of Explainable AI (XAI). The project focuses on using SHAP (SHapley Additive exPlanations) to explain the decisions of an unsupervised anomaly detection model—a Stacked Autoencoder—trained to spot credit card fraud via high reconstruction error. The model is built on the popular Kaggle Credit Card Fraud dataset, where 28 of the features (V1-V28) are the principal components from a PCA transformation, anonymizing the original data for privacy.

The student's central dilemma is whether applying XAI techniques like SHAP to these abstract, PCA-derived features yields meaningful explanations. While SHAP can identify which principal components (e.g., V14, V17) most influenced the model's fraud flag, it cannot translate that back to human-understandable, real-world features like 'transaction amount' or 'merchant location.' This raises a critical question for the field: does explaining a model's behavior in the abstract feature space of a PCA transformation constitute a legitimate contribution, or does it render the XAI exercise functionally 'useless' for end-users who need actionable insights?

The online debate highlights a fundamental tension in applied machine learning between model performance, data privacy, and interpretability. The community's feedback will help determine if the thesis successfully navigates this challenge by contributing a framework for 'abstract interpretability' or if it underscores a major limitation of current XAI methods when applied to pre-processed, anonymized datasets.

Key Points

Project uses SHAP to explain a Stacked Autoencoder for fraud detection on the PCA-anonymized Kaggle dataset.
Core challenge: Explanations reference abstract PCA components (V1-V28), not original, interpretable features.
Thesis debate centers on whether 'abstract interpretability' is a valid XAI contribution for real-world use.

Why It Matters

It tests the real-world applicability of XAI on privacy-protected data, a common hurdle in finance and healthcare.

Read Original Article

[P] Using SHAP to explain Unsupervised Anomaly Detection on PCA-anonymized data (Credit Card Fraud). Is this a valid approach for a thesis?

Why It Matters

Stay Ahead in AI