Open Source

Slower Means Faster: Why I Switched from Qwen3 Coder Next to Qwen3.5 122B

A developer found the slower 122B model completed twice the work with fewer crashes and hallucinations.

Deep Dive

A developer's viral experiment reveals a counterintuitive truth in local AI coding: sometimes slower is faster. After struggling with Alibaba's Qwen3 Coder Next model—which boasted impressive specs of ~1000 t/s prompt processing and ~37 t/s generation on their RTX 5070 Ti rig—they found its real-world performance lacking. The model, used in a Ralph-style agentic setup to autonomously tackle 110 coding tasks, was crashing constantly and only completing about 15 tasks on a good day, despite its high token throughput.

Frustrated, the developer switched to the much larger Qwen3.5 122B model, expecting a slowdown. The 122B's specs were indeed worse: ~700 t/s prefill and ~17 t/s generation, roughly half the throughput. However, the opposite occurred. The larger model completed roughly twice the work in the same time, knocking out more tasks with far greater stability. The backend stopped crashing, outputs required fewer retries, and the higher code quality meant less time spent fixing errors.

This case study highlights a critical lesson for developers running local LLMs: raw token generation speed (t/s) is a poor proxy for real-world task throughput in complex, agentic workflows. A faster model that hallucinates more, crashes frequently, or produces low-quality code can create massive overhead through debugging, restarting, and manual correction. For complex coding tasks where reliability and reasoning depth matter, a larger, more capable model—even if slower on paper—can dramatically increase effective productivity by getting things right the first time.

Key Points
  • Qwen3.5 122B completed ~30+ tasks/day vs. Qwen3 Coder Next's ~15, despite 50% slower token generation (17 t/s vs 37 t/s)
  • The larger model provided superior backend stability, fewer crashes, and required significantly less manual intervention and debugging
  • The experiment proves raw token speed is misleading for agentic coding; model capability and reliability drive real throughput

Why It Matters

For developers building AI coding agents, choosing models based on capability and stability over raw speed metrics can double real-world productivity.