Research & Papers

Exploratory Sampling boosts LLM diversity with <5% overhead

New decoding method ESamp uses prediction errors to explore novel semantic patterns.

Deep Dive

Researchers from Shanghai Jiao Tong University and ByteDance have introduced Exploratory Sampling (ESamp), a novel decoding method that explicitly encourages semantic diversity in large language model (LLM) generation. Unlike standard stochastic sampling, which mainly produces surface-level lexical variation, ESamp leverages a fundamental property of neural networks: they make lower-error predictions on familiar inputs and higher errors on novel ones. The method trains a lightweight Distiller model at test time to predict deep-layer hidden representations from shallow-layer ones, modeling the LLM's depth-wise representation transitions. During decoding, the Distiller continuously adapts to the current generation context, using prediction error as a novelty signal to reweight candidate token extensions toward less-explored semantic patterns. ESamp is implemented with an asynchronous training-inference pipeline, achieving less than 5% worst-case overhead (1.2% in the optimized release).

Empirical results demonstrate that ESamp significantly boosts Pass@k efficiency across reasoning, mathematics, science, and code generation benchmarks, showing superior or comparable performance to strong stochastic and heuristic baselines. Notably, it breaks the traditional trade-off between diversity and coherence in creative writing tasks, enabling models to generate more varied outputs without sacrificing quality. The method's robust generalization across domains suggests it could become a standard tool for test-time scaling, particularly for applications requiring diverse solution exploration such as code generation and scientific discovery. The code has been released on GitHub.

Key Points
  • ESamp uses a lightweight Distiller to predict hidden-layer representations, achieving under 5% overhead
  • Boosts Pass@k efficiency on reasoning, math, science, and code benchmarks
  • Breaks the diversity-coherence trade-off in creative writing tasks

Why It Matters

Enables LLMs to explore diverse solutions at test time, improving reasoning and creativity with minimal compute.

📬 Get the top 10 AI stories daily