Open Source

llama-server bug: extra spaces in JSON break Qwen3.6's preserve_thinking

A single space in your model config could silently break reasoning output.

Deep Dive

A niche but critical bug has surfaced for developers running Qwen3.6 through llama-server (build v9102). The preserve_thinking parameter, which controls whether the model outputs its internal reasoning, stops working if the chat-template-kwargs JSON contains extra spaces. Specifically, `{ "preserve_thinking": true }` fails while `{"preserve_thinking": true}` works fine. The root cause appears to be a parsing quirk in how llama-server’s .ini config reader handles whitespace within JSON strings.

Users can verify the bug by sending a prompt like “think of a number from 1 to 100, don’t tell me what it is, I’m going to guess it” and checking if the reasoning output remains consistent. If the hidden number changes between guesses, template kwargs are being parsed incorrectly. This affects anyone using Qwen3.6 with llama-server and relying on preserved reasoning traces for debugging, chaining, or transparency in their AI workflows.

Key Points
  • Extra spaces in chat-template-kwargs JSON cause preserve_thinking to silently fail in llama-server v9102.
  • Correct format is `{"preserve_thinking": true}` — no leading/trailing or inner spaces before the opening brace or after the colon.
  • Test with a number-guessing prompt: inconsistent reasoning output indicates a parsing error.

Why It Matters

A tiny config formatting mistake can silently break reasoning transparency, wasting hours of debugging for AI developers.