Tampermonkey script adds reasoning toggle for Qwen 3.6 on llama.cpp
Toggle Qwen 3.6's thinking on and off with a single button in llama.cpp's web chat.
A new Tampermonkey script by developer Eaman brings a long-requested feature to llama.cpp's web chat interface: a one-click toggle to enable or disable reasoning for Qwen 3.6 models. The script works by intercepting fetch requests to the `/v1/chat/completions` endpoint. When reasoning is turned off, it modifies the request body to set `enable_thinking: false` and `reasoning_budget: 0`. When enabled, it reverts those values. This avoids the need to modify llama.cpp source code or rebuild constantly.
The script also injects a styled toggle button directly into the web UI, placed next to the file upload button. The button uses matching colors, rounded pill shape, and transitions to feel native. State is saved in `localStorage` under `qwen_reasoning`, so the preference persists across sessions. Installation requires the Tampermonkey browser extension and adding the provided userscript. It matches URLs `http://localhost:8080/*` and `http://127.0.0.1:8080/*`. This solution gives local inference users fine-grained control over model reasoning without sacrificing convenience.
- Intercepts fetch to /v1/chat/completions to toggle `enable_thinking` and `reasoning_budget` parameters.
- Injects a native-style button into llama.cpp's web chat UI to toggle reasoning on/off.
- Persists toggle state in localStorage across browser sessions, no recompilation needed.
Why It Matters
Gives local LLM users quick control over Qwen 3.6 reasoning without rebuilding llama.cpp.