Research & Papers

Synthetic Designed Experiments for Diagnosing Vision Model Failure

New method boosts accuracy from 49.9% to 79% by diagnosing failure types

Deep Dive

Current synthetic data pipelines for computer vision generate images without diagnosing what the downstream model actually needs, treating synthetic data as cheap real data via random sampling. Krisanu Sarkar's paper, submitted to CVPR SynData4CV 2026, proposes a fundamentally different approach: Synthetic Designed Experiments for Representational Sufficiency (SDRS). SDRS leverages the statistical theory of Design of Experiments (DoE) to treat the vision model as a black-box system and the synthetic generator as an experimental apparatus. Using fractional factorial designs, it efficiently audits the model's factor-sensitivity profile via ANOVA decomposition, categorizing failures into two actionable types: Type I gaps (coverage failures on underrepresented factor levels) and Type II gaps (reliance on spurious nuisance dependencies). The audit then prescribes targeted synthetic data to address each specific gap.

SDRS is validated across three experiments. First, on the controlled dSprites dataset with planted biases, the audit correctly identifies both gap types, and using targeted data improves accuracy from 49.9% to 79.0%. Second, for dense segmentation on procedural scenes, detecting background-complexity shortcuts and applying targeted data raises mean IoU from 0.948 to 0.998. Third, an entanglement detection experiment shows that the ANOVA audit can identify cross-factor contamination in imperfect generators. Additionally, the paper reveals that per-factor invariance penalties can transfer sensitivity between factors, highlighting an open problem for representation-level correction. This work offers a principled, efficient method to make synthetic data generation truly useful for diagnosing and fixing vision model failures.

Key Points
  • SDRS uses fractional factorial designs to audit vision models' factor sensitivity via ANOVA, achieving 29.1 percentage point accuracy boost on dSprites (49.9% to 79.0%).
  • Two failure types identified: Type I (coverage gaps) and Type II (spurious dependencies), enabling targeted synthetic data prescriptions.
  • Segmentation mIoU improved from 0.948 to 0.998 by detecting and fixing background-complexity shortcuts in procedural scenes.

Why It Matters

Transforms synthetic data from random generation to systematic diagnosis, enabling more reliable and explainable computer vision models.