Estimates pairwise conditional mutual information from hidden states of masked diffusion models in a single forward pass?

Estimates pairwise conditional mutual information from hidden states of masked diffusion models in a single forward pass.

Enables MI-guided parallel decoding, reducing inference forward passes by 3-5x on Sudoku and protein sequences?

Enables MI-guided parallel decoding, reducing inference forward passes by 3-5x on Sudoku and protein sequences.

Applied to ESM-C for protein generation; recovers known structural constraints while maintaining generative quality?

Applied to ESM-C for protein generation; recovers known structural constraints while maintaining generative quality.

Research & Papers

New neural estimator boosts masked diffusion models 3-5x faster inference

arXiv cs.LG May 21, 2026

⚡Researchers estimate pairwise mutual information from hidden states to accelerate generation.

Deep Dive

Masked diffusion models (MDMs) are powerful for sequence generation but typically decode variables sequentially because they don't explicitly model dependencies between variables. A new paper from Sharma, Wang, and Li (submitted to ICML 2026) introduces a neural estimator that extracts pairwise conditional mutual information (MI) directly from a pretrained MDM's hidden states. The estimator uses the model's own conditional distributions as ground truth for supervision, learning to output the full MI matrix in a single forward pass. This reveals the internal dependency structure the model has learned.

By identifying conditionally independent subsets of variables, the framework enables MI-guided parallel decoding—generating multiple tokens simultaneously without sacrificing quality. The authors tested on Sudoku puzzles and protein sequence generation using ESM-C. Results show a 3-5x reduction in the number of forward passes during inference compared to sequential decoding, while matching generative quality and outperforming entropy-based parallelization baselines. The MI maps also recovered known structural constraints (e.g., Sudoku rules, protein contacts). This opens a path to much faster sampling from discrete diffusion models in domains like drug design and combinatorial optimization.

Key Points

Estimates pairwise conditional mutual information from hidden states of masked diffusion models in a single forward pass.
Enables MI-guided parallel decoding, reducing inference forward passes by 3-5x on Sudoku and protein sequences.
Applied to ESM-C for protein generation; recovers known structural constraints while maintaining generative quality.

Why It Matters

Speeds up masked diffusion models for sequence generation without quality loss.

Read Original Article

New neural estimator boosts masked diffusion models 3-5x faster inference

Why It Matters

Related Articles

🚀 Stay Ahead in AI