Research & Papers

The Stepwise Informativeness Assumption: Why are Entropy Dynamics and Reasoning Correlated in LLMs?

New theory explains why entropy dynamics predict correct answers across models like Llama 3.2 and Gemma-2.

Deep Dive

A team of researchers has introduced the Stepwise Informativeness Assumption (SIA), a new theoretical framework explaining a persistent mystery in AI: why the internal uncertainty (entropy dynamics) of a large language model robustly predicts whether its final answer will be correct. The paper argues that models reason correctly when they progressively accumulate information about the true answer through each step of their internal 'thought' process, a principle formalized as SIA. The authors demonstrate that this assumption naturally emerges from standard training methods like maximum-likelihood optimization on human reasoning data and is reinforced by fine-tuning and reinforcement learning pipelines.

The team empirically validated SIA across major reasoning benchmarks including GSM8K, ARC, and SVAMP, using a diverse suite of open-weight models like Meta's Llama-3.2, Google's Gemma-2, Alibaba's Qwen-2.5, and DeepSeek. Their findings show that correct reasoning traces exhibit a characteristic, observable pattern in how the model's conditional answer entropy changes step-by-step. This work moves the field from purely empirical observation to a testable theoretical foundation, providing a new lens to diagnose, understand, and potentially improve the reasoning capabilities of LLMs by analyzing their internal information flow.

Key Points
  • Proposes Stepwise Informativeness Assumption (SIA): LLMs reason correctly by accumulating answer-specific information step-by-step.
  • Validated across 5 model families (Gemma-2, Llama-3.2, Qwen-2.5, DeepSeek, Olmo) and 3 benchmarks (GSM8K, ARC, SVAMP).
  • Shows standard training (fine-tuning, RL) induces predictable entropy patterns that correlate with answer correctness.

Why It Matters

Provides a theoretical basis for interpreting LLM 'thinking,' enabling better model diagnostics and more reliable reasoning systems.