Research & Papers

Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models

arXiv cs.AI April 27, 2026

⚡Even at temperature zero, LLMs produce different outputs—here's why.

Deep Dive

A new paper from researchers Alberto Messina and Stefano Scotta, published in Transactions on Machine Learning Research (TMLR), introduces the concept of 'background temperature' (T_bg) to characterize hidden randomness in large language models (LLMs). Even when decoding with temperature set to T=0—which should theoretically produce deterministic outputs—LLMs can generate divergent results for identical inputs. The authors build on earlier work by Thinking Machines Lab, which identified implementation-level sources of nondeterminism including batch-size variation, kernel non-invariance, and floating-point non-associativity.

Messina and Scotta formalize this behavior by defining T_bg as the effective temperature induced by an implementation-dependent perturbation process observed even when nominal T=0. They provide clean definitions, show how T_bg relates to a stochastic perturbation governed by the inference environment I, and propose an empirical protocol to estimate T_bg via the equivalent temperature T_n(I) of an ideal reference system. Pilot experiments run on a representative pool of major LLM providers demonstrate the concept, with implications for reproducibility, evaluation, and deployment. The work highlights a critical but often overlooked source of variability in LLM outputs, urging practitioners to account for background temperature when comparing model performance or deploying models in production.

Key Points

Background temperature (T_bg) quantifies hidden randomness in LLMs even at nominal T=0.
Sources include batch-size variation, kernel non-invariance, and floating-point non-associativity.
Pilot experiments across major LLM providers demonstrate the concept and its implications for reproducibility.

Why It Matters

This explains why LLM outputs vary unpredictably, impacting reliability in production and benchmark reproducibility.

Read Original Article

Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models

Why It Matters

Stay Ahead in AI