New neural estimator boosts masked diffusion models 3-5x faster inference
Researchers estimate pairwise mutual information from hidden states to accelerate generation.
Masked diffusion models (MDMs) are powerful for sequence generation but typically decode variables sequentially because they don't explicitly model dependencies between variables. A new paper from Sharma, Wang, and Li (submitted to ICML 2026) introduces a neural estimator that extracts pairwise conditional mutual information (MI) directly from a pretrained MDM's hidden states. The estimator uses the model's own conditional distributions as ground truth for supervision, learning to output the full MI matrix in a single forward pass. This reveals the internal dependency structure the model has learned.
By identifying conditionally independent subsets of variables, the framework enables MI-guided parallel decoding—generating multiple tokens simultaneously without sacrificing quality. The authors tested on Sudoku puzzles and protein sequence generation using ESM-C. Results show a 3-5x reduction in the number of forward passes during inference compared to sequential decoding, while matching generative quality and outperforming entropy-based parallelization baselines. The MI maps also recovered known structural constraints (e.g., Sudoku rules, protein contacts). This opens a path to much faster sampling from discrete diffusion models in domains like drug design and combinatorial optimization.
- Estimates pairwise conditional mutual information from hidden states of masked diffusion models in a single forward pass.
- Enables MI-guided parallel decoding, reducing inference forward passes by 3-5x on Sudoku and protein sequences.
- Applied to ESM-C for protein generation; recovers known structural constraints while maintaining generative quality.
Why It Matters
Speeds up masked diffusion models for sequence generation without quality loss.