RTM: Recursive latent refinement boosts image diversity and quality
One-pass generation is outdated: recursive refinement improves both precision and recall.
A new arXiv paper from Mehdi Esmaeilzadeh and co-authors argues that current generative model evaluation is broken. The dominant metric, FID (Fréchet Inception Distance), is nearly saturated and often masks mode collapse: models can produce a few sharp near-duplicate images and still score well. To solve this, the authors propose RTM (Recursive Latent Refinement), a method that replaces the standard single-pass latent mapping in style-based generators with an iterative refinement process. By repeatedly updating the latent code using gradient descent on a learned energy function, RTM encourages the generator to explore modes that would otherwise be missed.
RTM is integrated with IMLE, a training framework that explicitly optimizes for mode coverage rather than sample fidelity. The result: on CIFAR-10, CelebA-HQ (256×256), and nine few-shot benchmarks, RTM achieves the highest precision and recall among current state-of-the-art models, while keeping FID competitive. The method also boosts performance of older architectures like StyleGAN2 and StyleGAN2-ADA on AFHQ-v1 (512×512). Unlike flow-matching baselines that sacrifice coverage for FID, RTM improves both quality and diversity simultaneously. This work suggests that a one-pass approach is fundamentally limiting for image generation.
- RTM replaces single-pass latent mapping with iterative refinement, improving both precision and recall.
- Combined with IMLE, RTM achieves state-of-the-art diversity on CIFAR-10, CelebA-HQ (256×256), and nine few-shot benchmarks.
- Also improves StyleGAN2 and StyleGAN2-ADA on AFHQ-v1 (512×512), demonstrating broad applicability.
Why It Matters
This approach directly tackles mode collapse in generative AI, enabling more diverse and faithful image generation for production systems.