Open Source

Qwen3.6 27B Coding Agent Test: Copilot 3x Slower Than Pi, Claude, OpenCode

Same model, 13 vs 4 LLM calls — harness matters more than you think.

Deep Dive

Developer sdfgeoff tested the Qwen3-vl-4 model across four coding agent harnesses: GitHub Copilot, Pi, Claude Code, and OpenCode. Results show massive performance differences despite identical model. Copilot required 13 LLM requests and over 14 minutes to create an SVG file, while Pi, Claude Code, and OpenCode each used 4 requests (~3 minutes). OpenCode’s internet search gave it an edge on research tasks—it could pull specific filament temperatures for a 3D printer explainer. The model struggled with Copilot’s tool schema, constantly re-editing the same diffs. Qwen3-vl-4 also looped endlessly in OpenCode on the pelican.svg task.

Key Points
  • GitHub Copilot required 13 LLM requests vs. just 4 for Pi, Claude Code, and OpenCode on the same task.
  • Copilot used 21,184 output tokens and took 14:26 — others averaged ~5,000 tokens and ~3:30.
  • OpenCode’s internet search feature gave better results on research-heavy tasks (e.g., specific 3D printer filament temps).

Why It Matters

Agent harness design can bottleneck even the best models — Copilot’s tool interface adds 3x overhead vs. simpler alternatives.