New DUS scheduler speeds masked diffusion models 5.8x without quality loss
DUS partitions positions into dilated groups to unmask in parallel, beating confidence-based planners.
Masked diffusion language models (MDLMs) promise fast, non-autoregressive text generation, but existing samplers that pick tokens based on model confidence ignore interactions when unmasking multiple positions in parallel, effectively reducing to slow autoregressive behavior. A new paper from Ben-Gurion University introduces the Dilated Unmasking Scheduler (DUS), an inference-only, planner-model-free method that partitions sequence positions into non-adjacent dilated groups and unmasks them in parallel to minimize an upper bound on joint entropy gain at each denoising step. By explicitly trading off the number of network calls against generation quality, DUS recovers most of the performance lost under traditional parallel unmasking strategies.
DUS was evaluated across diverse benchmarks including math (GSM8K, MATH500), code (HumanEval, MBPP), general knowledge (BBH, MMLU-Pro), and instruction following (IFEval). It outperforms confidence-based planners and turns the diffusion-specific quality-speed trade-off into a deterministic, predictable speedup set by the block size B, yielding up to 5.8x wall-clock speedup over token-by-token MDLM decoding without modifying the underlying denoiser. Applied as a drop-in post-filter, dilated spacing also improves adaptive samplers. The paper has been accepted at ICML 2026 and code is publicly available.
- DUS achieves up to 5.8x wall-clock speedup over token-by-token MDLM decoding.
- It partitions positions into dilated groups to parallelize unmasking while minimizing entropy gain.
- Works as a drop-in post-filter, no model retraining needed, and also improves adaptive samplers.
Why It Matters
Enables fast, high-quality text generation from diffusion models without retraining, crucial for real-time AI applications.