GATD achieves SOTA tabular synthesis with 3.5x fewer parameters
Explicit relational supervision boosts tabular diffusion performance dramatically.
Tabular data synthesis is crucial for privacy-preserving data sharing, but diffusion models often struggle to capture inter-column relationships implicitly. David Turtora Zagardo's new paper introduces Geometry-Aware Tabular Diffusion (GATD), a method that injects explicit relational supervision into the denoising process. GATD calculates pairwise angles and lengths from column value differences and feeds them as additional inputs and auxiliary targets. This simple yet powerful inductive bias forces the model to learn column relationships directly, rather than relying on implicit representations.
GATD's MLP instantiation achieves state-of-the-art benchmark performance across ten datasets while using 3.5x fewer parameters on average—up to 25x reduction for classification tasks. It wins 8/10 Shape, 7/10 Trend, and 9/10 downstream utility (F1/RMSE) comparisons, reducing Shape error by 27% and Trend error by 20%. The approach also transfers to GNN and Transformer denoisers, improving Shape on 27/30 and Trend on 25/30 architecture-dataset cells. Ablation studies confirm that supervision—not extra inputs or capacity—drives the gains, establishing explicit relational supervision as a portable inductive bias for tabular diffusion.
- GATD uses pairwise angles and lengths from column value differences as inputs and auxiliary targets for tabular denoisers.
- MLP instantiation uses 3.5x fewer parameters on average (up to 25x for classification) while achieving SOTA.
- Wins 8/10 Shape, 7/10 Trend, and 9/10 downstream utility benchmarks across 10 datasets; reduces Shape error by 27% and Trend error by 20%.
Why It Matters
New inductive bias for tabular diffusion enables efficient, high-quality synthetic data generation.