Intercepts fetch to /v1/chat/completions to toggle `enable_thinking` and `reasoning_budget` parameters?

Intercepts fetch to /v1/chat/completions to toggle `enable_thinking` and `reasoning_budget` parameters.

Injects a native-style button into llama.cpp's web chat UI to toggle reasoning on/off?

Injects a native-style button into llama.cpp's web chat UI to toggle reasoning on/off.

Persists toggle state in localStorage across browser sessions, no recompilation needed?

Persists toggle state in localStorage across browser sessions, no recompilation needed.

Open Source

Tampermonkey script adds reasoning toggle for Qwen 3.6 on llama.cpp

r/LocalLLaMA May 31, 2026

⚡Toggle Qwen 3.6's thinking on and off with a single button in llama.cpp's web chat.

Deep Dive

A new Tampermonkey script by developer Eaman brings a long-requested feature to llama.cpp's web chat interface: a one-click toggle to enable or disable reasoning for Qwen 3.6 models. The script works by intercepting fetch requests to the `/v1/chat/completions` endpoint. When reasoning is turned off, it modifies the request body to set `enable_thinking: false` and `reasoning_budget: 0`. When enabled, it reverts those values. This avoids the need to modify llama.cpp source code or rebuild constantly.

The script also injects a styled toggle button directly into the web UI, placed next to the file upload button. The button uses matching colors, rounded pill shape, and transitions to feel native. State is saved in `localStorage` under `qwen_reasoning`, so the preference persists across sessions. Installation requires the Tampermonkey browser extension and adding the provided userscript. It matches URLs `http://localhost:8080/*` and `http://127.0.0.1:8080/*`. This solution gives local inference users fine-grained control over model reasoning without sacrificing convenience.

Key Points

Intercepts fetch to /v1/chat/completions to toggle `enable_thinking` and `reasoning_budget` parameters.
Injects a native-style button into llama.cpp's web chat UI to toggle reasoning on/off.
Persists toggle state in localStorage across browser sessions, no recompilation needed.

Why It Matters

Gives local LLM users quick control over Qwen 3.6 reasoning without rebuilding llama.cpp.

Read Original Article

Tampermonkey script adds reasoning toggle for Qwen 3.6 on llama.cpp

Why It Matters

Related Articles

🚀 Stay Ahead in AI