RTM replaces single-pass latent mapping with iterative refinement, improving both precision and recall?

RTM replaces single-pass latent mapping with iterative refinement, improving both precision and recall.

Combined with IMLE, RTM achieves state-of-the-art diversity on CIFAR-10, CelebA-HQ (256×256), and nine few-shot benchmarks?

Combined with IMLE, RTM achieves state-of-the-art diversity on CIFAR-10, CelebA-HQ (256×256), and nine few-shot benchmarks.

Also improves StyleGAN2 and StyleGAN2-ADA on AFHQ-v1 (512×512), demonstrating broad applicability?

Also improves StyleGAN2 and StyleGAN2-ADA on AFHQ-v1 (512×512), demonstrating broad applicability.

Research & Papers

RTM: Recursive latent refinement boosts image diversity and quality

arXiv cs.CV May 18, 2026

⚡One-pass generation is outdated: recursive refinement improves both precision and recall.

Deep Dive

A new arXiv paper from Mehdi Esmaeilzadeh and co-authors argues that current generative model evaluation is broken. The dominant metric, FID (Fréchet Inception Distance), is nearly saturated and often masks mode collapse: models can produce a few sharp near-duplicate images and still score well. To solve this, the authors propose RTM (Recursive Latent Refinement), a method that replaces the standard single-pass latent mapping in style-based generators with an iterative refinement process. By repeatedly updating the latent code using gradient descent on a learned energy function, RTM encourages the generator to explore modes that would otherwise be missed.

RTM is integrated with IMLE, a training framework that explicitly optimizes for mode coverage rather than sample fidelity. The result: on CIFAR-10, CelebA-HQ (256×256), and nine few-shot benchmarks, RTM achieves the highest precision and recall among current state-of-the-art models, while keeping FID competitive. The method also boosts performance of older architectures like StyleGAN2 and StyleGAN2-ADA on AFHQ-v1 (512×512). Unlike flow-matching baselines that sacrifice coverage for FID, RTM improves both quality and diversity simultaneously. This work suggests that a one-pass approach is fundamentally limiting for image generation.

Key Points

RTM replaces single-pass latent mapping with iterative refinement, improving both precision and recall.
Combined with IMLE, RTM achieves state-of-the-art diversity on CIFAR-10, CelebA-HQ (256×256), and nine few-shot benchmarks.
Also improves StyleGAN2 and StyleGAN2-ADA on AFHQ-v1 (512×512), demonstrating broad applicability.

Why It Matters

This approach directly tackles mode collapse in generative AI, enabling more diverse and faithful image generation for production systems.

Read Original Article

RTM: Recursive latent refinement boosts image diversity and quality

Why It Matters

Related Articles

🚀 Stay Ahead in AI