Open Source

ByteDance's Cola-DLM generates text via continuous latent diffusion

Diffusion language model matches GPT quality using latent-space generation...

Deep Dive

ByteDance's Seed team has open-sourced Cola-DLM (Continuous Latent Diffusion Language Model), a novel approach to text generation that operates entirely in continuous latent space rather than discrete token sequences. The architecture consists of two components: a Text VAE that encodes text into continuous latent sequences and decodes them back to tokens, and a block-causal Diffusion Transformer (DiT) that performs latent prior transport using Flow Matching. This two-stage training process first pretrains the VAE, then jointly fine-tunes both components. The released checkpoint corresponds to the 2000 EFLOPs training compute point from the paper's scaling curve, demonstrating strong performance in quality and generation speed.

Cola-DLM represents a significant departure from autoregressive language models like GPT-4 and Llama. By using diffusion to iteratively refine latent representations, it can generate text with better global coherence and controllability. The model uses the OLMo 2 tokenizer with a 100,278-entry vocabulary and special tokens for padding, end-of-sentence, and instruction markers. Released under Apache 2.0 license, the model is compatible with PyTorch 2.1+ and Hugging Face Transformers 4.40+. The team has also provided a GitHub repository, paper, project page, and blog post for researchers looking to explore continuous latent diffusion for language tasks.

Key Points
  • Cola-DLM uses a Text VAE with block-causal Diffusion Transformer for continuous latent diffusion language modeling
  • Trained with Flow Matching objective; released checkpoint at 2000 EFLOPs with strong scaling performance
  • Uses OLMo 2 tokenizer (100,278 vocab), Apache 2.0 licensed, integrated with Hugging Face and PyTorch

Why It Matters

Continuous latent diffusion offers an alternative to autoregressive models, potentially improving text generation coherence and enabling new capabilities.