Research & Papers

SilIF boosts fraud detection AUC-PR by 0.8% over plain Isolation Forest

New unsupervised method adds silhouette scoring to catch 0.8% more fraud.

Deep Dive

Unsupervised anomaly detection is critical for transaction fraud detection where labeled data is scarce. Isolation Forest (IF) is a popular method due to its scalability, but its scoring can miss subtle structural patterns. Venkatakrishnan Gopalakrishnan introduces SilIF, which adds a silhouette-based scoring layer to IF. For each data point, SilIF extracts a vector of per-tree path lengths from the forest, then clusters these 'fingerprints' into structural groups. A silhouette score measures how well the point fits its assigned group versus the nearest alternative. This signal is combined with the base IF score via a single hyperparameter alpha, offering a tunable enhancement that requires no additional labels.

On the IEEE-CIS Fraud Detection benchmark (~590K transactions, 3.5% fraud rate), SilIF with alpha=1.0 achieved an average AUC-PR improvement of +0.0080 over plain Isolation Forest across five random seeds, winning on all five seeds (paired t-test p=0.046). However, on the synthetic Sparkov credit-card dataset, the silhouette augmentation did not improve performance. The paper honestly characterizes when SilIF helps and when it does not, making it a practical, easy-to-deploy option for teams already using Isolation Forest. The code is publicly available, enabling quick integration into existing fraud detection pipelines.

Key Points
  • SilIF adds a silhouette-based scoring layer to Isolation Forest using per-tree path length vectors.
  • On IEEE-CIS 590K transaction dataset, it achieved +0.0080 AUC-PR improvement over plain IF across all five seeds.
  • Tunable via single hyperparameter alpha; no labels required, making it ideal for unsupervised fraud detection.

Why It Matters

Better unsupervised fraud detection catches more fraudulent transactions without costly labeled data, critical for financial institutions.