Research & Papers

[D] Why do people say that GANs are dead or outdated when they're still commonly used?

Despite claims of obsolescence, GANs power the autoencoders in every major image and audio model.

Deep Dive

A recent viral post on the r/MachineLearning subreddit has sparked a crucial technical debate, pushing back against the narrative that Generative Adversarial Networks (GANs) are obsolete. The original poster, an AI practitioner in image and audio generation, strongly contested claims that GANs are a 'dated concept,' arguing they remain fundamentally indispensable.

The core technical argument is that virtually every leading diffusion model—including Stable Diffusion, Midjourney's underlying architecture, and the new Flux model—uses a frozen, pre-trained autoencoder to compress images into a latent space for efficient processing. These autoencoders are almost universally trained using adversarial (GAN) objectives, not diffusion. Key examples cited include the VAE in Stable Diffusion 1.5/XL and the new 'Flux VAE.' The same principle applies to modern audio generation models. The poster asserts it's 'impossible to get even close to SOTA' without this GAN-based component, framing it as the essential 'wheel' upon which the 'car' of diffusion models is built.

This debate matters because it corrects a common oversimplification in AI discourse. While diffusion models have surpassed pure GANs in generating highly detailed and diverse images from text, they have not replaced GANs but rather built upon them. The adversarial training paradigm provides unmatched efficiency for learning compact, high-fidelity data representations. For practitioners, understanding this layered architecture is key to innovating in model design. The takeaway is that GANs have evolved from being the headline generator to a critical, behind-the-scenes engine, proving their enduring value in the AI stack.

Key Points
  • Modern diffusion models like Stable Diffusion and Flux rely on GAN-trained autoencoders as a frozen backbone for data compression.
  • The poster states it is 'impossible' to reach state-of-the-art (SOTA) results in image/audio generation without this foundational GAN component.
  • This reframes GANs not as outdated, but as a critical, embedded technology within contemporary AI pipelines.

Why It Matters

Understanding that AI progress is often additive, not replacement, is crucial for accurate technical discourse and effective R&D.