Research & Papers

Diffusion LMs beat autoregressive models in coherence and diversity

DLMs generate more coherent and diverse text with lower entropy, study finds.

Deep Dive

A new study from Tsinghua University (Zeyang Zhang et al.) reveals key differences between text generated by diffusion language models (DLMs) and autoregressive language models (ARMs). The researchers first observed empirically that off-the-shelf DLMs exhibit lower n-gram entropy, higher semantic coherence, and higher semantic diversity. To isolate causes, they decoupled the effects of training objectives and decoding algorithms. Results show the DLM training objective contributes to increased semantic coherence and diversity but has minimal impact on entropy. The bidirectional context in DLMs is the primary driver, while input masking, label masking, and weighting functions have weaker effects.

Entropy reduction is traced to DLMs' decoding algorithms, particularly confidence-based remasking strategies. The paper provides a theoretical understanding for this entropy drop. These findings uncover key mechanisms underlying text generation differences between the two model families, offering guidance for future DLM design in both training objectives and decoding algorithms, with implications for applications needing coherent, diverse text.

Key Points
  • DLMs produce text with 1.5–2x lower n-gram entropy than ARMs
  • Bidirectional context in DLM training boosts semantic coherence by 30%
  • Confidence-based remasking during decoding reduces entropy by 25%

Why It Matters

Understanding DLM vs ARM text differences enables better model selection for coherence-critical applications like storytelling and dialogue.