Research & Papers

Large Language Models Explore by Latent Distilling

arXiv cs.CL April 29, 2026

⚡New decoding method ESamp uses prediction errors to explore novel semantic patterns.

Deep Dive

Researchers from Shanghai Jiao Tong University and ByteDance have introduced Exploratory Sampling (ESamp), a novel decoding method that explicitly encourages semantic diversity in large language model (LLM) generation. Unlike standard stochastic sampling, which mainly produces surface-level lexical variation, ESamp leverages a fundamental property of neural networks: they make lower-error predictions on familiar inputs and higher errors on novel ones. The method trains a lightweight Distiller model at test time to predict deep-layer hidden representations from shallow-layer ones, modeling the LLM's depth-wise representation transitions. During decoding, the Distiller continuously adapts to the current generation context, using prediction error as a novelty signal to reweight candidate token extensions toward less-explored semantic patterns. ESamp is implemented with an asynchronous training-inference pipeline, achieving less than 5% worst-case overhead (1.2% in the optimized release).

Empirical results demonstrate that ESamp significantly boosts Pass@k efficiency across reasoning, mathematics, science, and code generation benchmarks, showing superior or comparable performance to strong stochastic and heuristic baselines. Notably, it breaks the traditional trade-off between diversity and coherence in creative writing tasks, enabling models to generate more varied outputs without sacrificing quality. The method's robust generalization across domains suggests it could become a standard tool for test-time scaling, particularly for applications requiring diverse solution exploration such as code generation and scientific discovery. The code has been released on GitHub.

Key Points

ESamp uses a lightweight Distiller to predict hidden-layer representations, achieving under 5% overhead
Boosts Pass@k efficiency on reasoning, math, science, and code benchmarks
Breaks the diversity-coherence trade-off in creative writing tasks

Why It Matters

Enables LLMs to explore diverse solutions at test time, improving reasoning and creativity with minimal compute.

Read Original Article

Large Language Models Explore by Latent Distilling

Why It Matters

Stay Ahead in AI