Research & Papers

Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models

arXiv cs.CL April 13, 2026

⚡Study shows optimal temperature varies by prompting strategy, challenging standard practices.

Deep Dive

A new arXiv preprint from researchers Mousa Salah and Amgad Muneer systematically investigates how temperature settings affect prompting strategies in extended reasoning large language models. The study challenges the common practice of using T=0 for reasoning tasks by evaluating chain-of-thought and zero-shot prompting across four temperature settings (0.0, 0.4, 0.7, and 1.0) using xAI's Grok-4.1 model on 39 challenging mathematical problems from the AMO-Bench benchmark.

Key findings reveal that zero-shot prompting achieves peak performance at moderate temperatures, reaching 59% accuracy at both T=0.4 and T=0.7, while chain-of-thought prompting performs best at the temperature extremes. Most significantly, the benefit of extended reasoning—where models perform explicit test-time computation—increases dramatically from 6x at T=0.0 to 14.3x at T=1.0, suggesting temperature optimization is crucial for maximizing reasoning capabilities.

The research demonstrates that temperature and prompting strategy must be optimized jointly rather than independently. This has practical implications for developers and researchers working with advanced reasoning models, indicating that default settings may be leaving significant performance gains on the table. The study provides concrete guidance for configuring extended reasoning systems to achieve optimal results across different problem types and complexity levels.

Key Points

Zero-shot prompting peaks at 59% accuracy at moderate temperatures (T=0.4-0.7)
Extended reasoning benefits increase from 6x at T=0.0 to 14.3x at T=1.0
Chain-of-thought performs best at temperature extremes, not at T=0 as commonly assumed

Why It Matters

Optimizing temperature with prompting strategy can dramatically improve reasoning performance in AI systems.

Read Original Article

Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models

Why It Matters

Stay Ahead in AI