Research & Papers

FLUID Framework Bridges Autoregressive and Diffusion LLMs with Elastic Horizons

New technique reuses GPT-style checkpoints to cut diffusion training costs by orders of magnitude.

Deep Dive

A new paper accepted at ACL 2026 introduces FLUID (From Autoregressive to Diffusion), a framework that bridges the structural gap between autoregressive (AR) and diffusion language models. Traditional diffusion models rely on bidirectional attention, preventing reuse of powerful AR backbones like GPT. FLUID solves this with Strictly Causal Alignment, a technique that allows seamless initialization from standard GPT-style checkpoints. This eliminates the need for massive pre-training from scratch—a hurdle that had made diffusion-based parallel generation cost-prohibitive.

FLUID also introduces Elastic Horizons, an entropy-driven mechanism that dynamically adjusts denoising strides based on local information density rather than fixed schedules. This leads to more efficient and context-aware generation. Experiments show FLUID achieves state-of-the-art performance across benchmarks while reducing training costs by orders of magnitude. The approach effectively marries the robust priors of AR models with the parallel efficiency of diffusion, offering a practical path for deploying fast, high-quality generative models. Code is available on GitHub.

Key Points
  • FLUID enables direct initialization from GPT-style checkpoints via Strictly Causal Alignment, bypassing costly pre-training.
  • Elastic Horizons dynamically adjusts denoising steps based on local information density, improving generation efficiency.
  • Achieves state-of-the-art results while cutting training costs by orders of magnitude compared to training diffusion from scratch.

Why It Matters

Makes diffusion-based LLMs practical by leveraging existing AR models, enabling faster and cheaper parallel text generation.