Research & Papers

Synthetic Tabular Generators Fail to Preserve Behavioral Fraud Patterns: A Benchmark on Temporal, Velocity, and Multi-Account Signals

Popular AI tools like CTGAN and TVAE miss critical behavioral fraud signals, degrading detection by up to 99x.

Deep Dive

A new research paper by Bhavana Sajja, 'Synthetic Tabular Generators Fail to Preserve Behavioral Fraud Patterns,' introduces a critical third dimension for evaluating synthetic data called 'behavioral fidelity.' While current methods check statistical fidelity (distributions) and downstream utility (model performance), they ignore the temporal, sequential, and structural patterns that define real-world entity behavior. The study formalizes four key fraud patterns (P1-P4) covering inter-event timing, burst activity, multi-account graph motifs, and velocity rules. It proves that dominant 'row-independent' generators are structurally incapable of reproducing these complex behavioral signals, regardless of architecture or data size.

Sajja's team benchmarked four major generators—CTGAN, TVAE, GaussianCopula, and TabularARGN—on the IEEE-CIS Fraud Detection and Amazon Fraud datasets. The results were stark: all failed severely. On IEEE-CIS, composite degradation ratios ranged from 24.4x worse than real data (TVAE) to 39.0x (GaussianCopula). On the Amazon dataset, row-independent generators scored between 81.6x and 99.7x degradation, while the more advanced TabularARGN still achieved only 17.2x. The paper documents specific failure modes and releases an open-source evaluation framework. This work has immediate implications for any domain using synthetic tabular data, from finance and healthcare to network security, where missing these behavioral fingerprints can render AI detection systems ineffective.

Key Points
  • Introduces 'behavioral fidelity' as a new, critical evaluation dimension for synthetic tabular data, beyond just statistics and utility.
  • Benchmarks show popular generators (CTGAN, TVAE) fail to preserve fraud patterns, with degradation ratios 24x to 99x worse than real data.
  • Proves row-independent generators are structurally incapable of replicating key behavioral signals like multi-account graph motifs (P3).

Why It Matters

AI systems trained on flawed synthetic data may miss critical fraud patterns, creating major security and financial risks.