Qwen Models with Claude Code on 36gb vram - insights
User benchmarks show the 80B model runs stable on 36GB VRAM, matching Sonnet 4.5's reliability.
A developer's hands-on benchmark of two Alibaba Qwen models running locally with Claude Code has provided valuable, practical data for the AI community. Testing the Qwen3-Coder-Next-80B and Qwen3.5-35B models using Unsloth's GGUF quantizations, both models successfully loaded into a combined 36GB of VRAM from an RTX 3090 and RTX 5070, handling a context length of approximately 132,000 tokens. The user noted the 35B model could potentially run a higher 5 or 6-bit quantization with the available memory.
The key finding was a stark difference in stability and performance. The larger Qwen3-Coder-Next-80B model was deemed "superior in all aspects," reliably completing coding tasks without errors in tool calls. In contrast, the Qwen3.5-35B model consistently failed by stopping in the middle of jobs within Claude Code, requiring manual intervention via the `/execute-plan` command from Superpowers to continue. Despite trying suggested parameters and updating to the latest Unsloth GGUF release to address a known bug, the 35B model's performance remained unsatisfactory.
For developers using Claude Code as a local AI coding agent, the 80B model's performance was comparable to using Anthropic's cloud-based Claude 3.5 Sonnet (the previous version) in terms of speed and, more importantly, reliability. This real-world test challenges the simple assumption that a smaller model is always preferable for local deployment, highlighting that stability and correct tool use can be more critical than raw parameter count for practical applications.
- Qwen3-Coder-Next-80B ran stable on 36GB VRAM, matching Claude 3.5 Sonnet's reliability for local coding tasks.
- The Qwen3.5-35B model failed consistently, stopping mid-job in Claude Code despite parameter tweaks and bug fixes.
- Test used Unsloth GGUF quantizations (IQ3_XXS for 80B, Q4_K_XL for 35B) with a ~132k context window.
Why It Matters
Provides crucial real-world data for developers choosing local AI coding models, where stability often trumps smaller size.