Achieved AUC 0.87, PR-AUC 0.52, recall@5% 0.57, precision@5% 0.37 on IEEE CIS dataset?

Achieved AUC 0.87, PR-AUC 0.52, recall@5% 0.57, precision@5% 0.37 on IEEE CIS dataset

Used GCN, GraphSAGE, and GAT architectures with identical poor performance?

Used GCN, GraphSAGE, and GAT architectures with identical poor performance

Heterogeneous graph built from transaction features (device, ID, amount) as nodes?

Heterogeneous graph built from transaction features (device, ID, amount) as nodes

Research & Papers

GNN Fraud Detection Model Trails SOTA with 0.87 AUC on IEEE Dataset

r/MachineLearning May 27, 2026

⚡Graph Neural Network struggles to match SOTA performance on standard fraud benchmark

Deep Dive

Researchers working on an explainable fraud detection GNN model have run into a performance wall. Using the widely-adopted IEEE CIS Fraud Detection Dataset—which already comes with substantial feature engineering—they constructed a heterogeneous graph embedding transaction attributes like device, transaction ID, and amount as nodes connected to transaction nodes. After training three popular GNN architectures (GCN, GraphSAGE, and GAT), all models performed similarly, yielding an average AUC of 0.87, PR-AUC of 0.52, recall at 5% of 0.57, and precision at 5% of 0.37. These numbers fall significantly short of state-of-the-art results reported in recent fraud detection literature, prompting the team to seek community input on what might be going wrong.

Potential pitfalls could include insufficient or incorrect graph construction—perhaps node/edge definitions don't capture the nuanced fraud patterns present in the data. The team may also be using a suboptimal training setup, such as improper handling of class imbalance (fraud datasets typically have less than 5% positive samples). Another possibility is that their feature engineering, while using the dataset's pre-built features, may miss domain-specific transformations or temporal dynamics that SOTA models exploit. The fact that all three GNN variants perform nearly identically suggests the bottleneck lies in the graph structure or data preprocessing rather than the model architecture itself. Addressing these issues could help close the gap and make the GNN a viable tool for explainable fraud detection in production environments.

Key Points

Achieved AUC 0.87, PR-AUC 0.52, recall@5% 0.57, precision@5% 0.37 on IEEE CIS dataset
Used GCN, GraphSAGE, and GAT architectures with identical poor performance
Heterogeneous graph built from transaction features (device, ID, amount) as nodes

Why It Matters

Highlights core challenges in applying GNNs to real-world fraud detection where SOTA remains elusive.

Read Original Article

GNN Fraud Detection Model Trails SOTA with 0.87 AUC on IEEE Dataset

Why It Matters

Related Articles

🚀 Stay Ahead in AI