Simple Self-Conditioning Adaptation for Masked Diffusion Models
A simple post-training tweak cuts generative errors in half while saving compute.
Masked diffusion models (MDMs) generate discrete sequences by iteratively denoising under an absorbing masking process. A key limitation is that if a token remains masked after a reverse update, the model discards its clean-state prediction for that position, forcing still-masked positions to be repeatedly inferred from the mask token alone. This design choice limits cross-step refinement.
To address this, researchers propose Simple Self-Conditioning Adaptation for Masked Diffusion Models (SCMDM). The method conditions each denoising step on the model's own previous clean-state predictions, requiring minimal architectural change and no extra denoiser evaluations during sampling. It avoids expensive retraining from scratch, unlike partial self-conditioning approaches. Evaluated across multiple domains, SCMDM achieves a 50% reduction in generative perplexity (42.89 to 23.72) on OWT-trained models, along with strong gains in discretized image synthesis, small molecular generation, and genomic distribution fidelity.
- SCMDM reduces generative perplexity from 42.89 to 23.72 on OWT-trained models – a 50% improvement.
- No extra denoiser evaluations during sampling and no need to retrain the model from scratch.
- Improves discrete image synthesis, small molecular generation, and genomic distribution modeling.
Why It Matters
A free lunch for generative AI: better outputs with zero extra compute at inference time.