Open Source

Running Qwen3.5-27B locally as the primary model in OpenCode

r/LocalLLaMA March 30, 2026

⚡A developer successfully used a local 27B parameter model as the primary engine for an advanced coding agent.

Deep Dive

A developer's weekend experiment demonstrates the viability of using a powerful, locally-run open-source model for complex agentic coding workflows. Aayush Garg successfully integrated Alibaba's Qwen3.5-27B model—a 27-billion parameter hybrid architecture model—as the core reasoning engine for the OpenCode coding assistant. The model ran locally on an NVIDIA RTX 4090 workstation using llama.cpp, quantized to 4-bit precision, consuming ~22GB of VRAM and supporting a 64K token context window. Performance metrics showed a prefill speed of ~2,400 tokens/second and a generation speed of ~40 tokens/second.

In practical testing, the Qwen3.5-27B-powered OpenCode agent correctly executed tool calls for multi-step coding tasks, including writing Python scripts, making edits, debugging, and testing code. Performance notably improved when the agent was equipped with specialized "skills" and connected to a Context7 Model Context Protocol (MCP) server, which provided access to up-to-date documentation. While the setup requires careful planning and context provisioning—and lags behind cloud models like GPT-4 or Claude Opus for casual "vibe coding"—it proves that capable, private, and cost-effective agentic coding is achievable with current open-source models and consumer hardware. The developer documented the entire setup process, including decisions on quantization, model selection, and chat templates, in a comprehensive blog post.

Key Points

Successfully ran Alibaba's Qwen3.5-27B model locally as the core LLM for the OpenCode agentic coding assistant.
Achieved ~40 tok/s generation on an RTX 4090 using 4-bit quantization and a 64K context window.
The agent performed complex coding tasks like writing and debugging scripts with correct tool calling, especially when enhanced with a documentation MCP server.

Why It Matters

It proves that powerful, private, and cost-effective AI coding agents are feasible today with open-source models and consumer hardware.

Read Original Article

Running Qwen3.5-27B locally as the primary model in OpenCode

Why It Matters

Stay Ahead in AI