Open Source

Can we talk about the reasoning token format chaos?

r/LocalLLaMA April 10, 2026

⚡AI models like Qwen and Gemma use incompatible <think> tags, forcing developers to write custom parsers for each one.

Deep Dive

A growing frustration among AI developers has gone viral, centering on the complete lack of standardization for how models output internal "reasoning" or "chain-of-thought" tokens. Major open-source models like Qwen and DeepSeek use XML-style `<think>...</think>` tags to denote these reasoning steps. Meanwhile, Google's Gemma model family uses a completely different format, sometimes employing `<|channel>...<channel|>` tags and other times outputting bare, un-delimited thought text with no markers at all.

This inconsistency creates a massive maintenance burden for anyone building on top of these models. Infrastructure tools like vLLM have introduced `--reasoning-parser` flags to handle specific models, but developers note this is just a stopgap that forces maintainers to play "whack-a-mole" with each new model release. For teams processing raw model outputs for logging, evaluation, or further analysis, the only solution is to write and maintain a custom parser for every single model—a tedious and error-prone process. The community is drawing direct parallels to the earlier fragmentation and eventual standardization of "chat templates," warning that the industry is doomed to repeat the same costly mistakes unless a formal standard emerges.

Key Points

Qwen and DeepSeek models use `<think>...</think>` tags for reasoning outputs, while Google's Gemma uses incompatible `<|channel>` tags or bare text.
Infrastructure tools like vLLM offer model-specific `--reasoning-parser` flags, but this is a fragile, reactive solution requiring constant updates.
Developers must write custom parsing logic for each model, recreating the fragmentation and maintenance hell previously seen with chat template formats.

Why It Matters

This fragmentation slows down AI application development, increases bugs, and forces teams to waste engineering hours on brittle, model-specific plumbing code.

Read Original Article

Can we talk about the reasoning token format chaos?

Why It Matters

Stay Ahead in AI