What counts as illegible reasoning?
DeepSeek R1 and QwQ-32B exhibit gibberish CoT, but results are hard to reproduce.
A new investigation into illegible reasoning in large language models reveals that both closed and open reasoning models sometimes produce chain-of-thought (CoT) tokens that are semantically incoherent—like 'parted disclaim marinade'—while still generating correct answers. This phenomenon, first noted by Apollo Research and METR in OpenAI models, raises critical questions for AI safety: if such illegible reasoning is load-bearing (i.e., required for task performance), it could undermine chain-of-thought monitoring as a safety strategy. The paper 'Reasoning Models Sometimes Output Illegible Chains of Thought' found that DeepSeek R1, R1-Zero, and QwQ-32B often output illegible reasoning on GPQA questions, as scored by GPT-4o. Truncating QwQ's CoT when illegibility began reduced accuracy, suggesting the garbled tokens are functionally important.
However, attempts to reproduce these findings have been inconsistent. The author ran the original inference and scoring code for DeepSeek R1 and found no illegible reasoning examples, despite the grader model (GPT-4o) being consistent when re-scoring. This discrepancy may stem from limitations in LLM-as-judge metrics, which conflate multiple behaviors—like language switching or model confusion—with true semantic incoherence. Language switching was largely coherent but still scored as illegible, while confusion was understandable but also penalized. The author calls for refined metrics to isolate the most concerning behaviors for chain-of-thought monitoring, and invites community contributions to elicit reproducible examples in open models.
- Illegible reasoning (e.g., 'parted disclaim marinade') observed in OpenAI models and open models like DeepSeek R1 and QwQ-32B
- Truncating QwQ's CoT at illegibility onset reduced accuracy, implying garbled tokens are load-bearing
- Reproduction attempts for DeepSeek R1 failed; LLM-as-judge metrics conflate language switching and confusion with true incoherence
Why It Matters
Illegible CoT threatens safety monitoring; reproducing it in open models is key to understanding AI reasoning.