Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models
Even at temperature zero, LLMs produce different outputs—here's why.
A new paper from researchers Alberto Messina and Stefano Scotta, published in Transactions on Machine Learning Research (TMLR), introduces the concept of 'background temperature' (T_bg) to characterize hidden randomness in large language models (LLMs). Even when decoding with temperature set to T=0—which should theoretically produce deterministic outputs—LLMs can generate divergent results for identical inputs. The authors build on earlier work by Thinking Machines Lab, which identified implementation-level sources of nondeterminism including batch-size variation, kernel non-invariance, and floating-point non-associativity.
Messina and Scotta formalize this behavior by defining T_bg as the effective temperature induced by an implementation-dependent perturbation process observed even when nominal T=0. They provide clean definitions, show how T_bg relates to a stochastic perturbation governed by the inference environment I, and propose an empirical protocol to estimate T_bg via the equivalent temperature T_n(I) of an ideal reference system. Pilot experiments run on a representative pool of major LLM providers demonstrate the concept, with implications for reproducibility, evaluation, and deployment. The work highlights a critical but often overlooked source of variability in LLM outputs, urging practitioners to account for background temperature when comparing model performance or deploying models in production.
- Background temperature (T_bg) quantifies hidden randomness in LLMs even at nominal T=0.
- Sources include batch-size variation, kernel non-invariance, and floating-point non-associativity.
- Pilot experiments across major LLM providers demonstrate the concept and its implications for reproducibility.
Why It Matters
This explains why LLM outputs vary unpredictably, impacting reliability in production and benchmark reproducibility.