Qwen3.5-35B-A3B is a gamechanger for agentic coding.
A new open-source AI model completes a 5-hour coding test in 10 minutes using just 22GB of VRAM.
Alibaba's Qwen research team has released Qwen3.5-35B-A3B, an open-weights AI model that's demonstrating remarkable coding capabilities on consumer hardware. Independent testing shows the 35-billion parameter model running efficiently on a single RTX 3090 GPU using Llama.cpp, consuming only 22GB of VRAM while achieving over 100 tokens/second generation speed. Most impressively, the model completed a comprehensive mid-level mobile developer coding test—designed to take human developers approximately 5 hours—in just 10 minutes, representing a 30x speed improvement. This performance suggests open-source models are reaching new thresholds for practical, agentic coding assistance.
The technical achievement lies in the model's efficiency and quality combination. Using MXFP4_MOE quantization via the GGUF format, Qwen3.5-35B-A3B maintains high reasoning quality while being accessible to developers with consumer-grade hardware. In another test, it recreated a complex dashboard interface (similar to OpenAI's Cursor demo) in approximately 5 minutes, a task that previously required Claude Code. This positions Qwen3.5 as the first open-weights model capable of matching proprietary agentic coding tools like early versions of Anthropic's Sonnet and Kodu.AI on accessible hardware, potentially democratizing advanced coding assistance beyond cloud API dependencies.
- Runs on consumer hardware: Single RTX 3090 GPU with 22GB VRAM via Llama.cpp
- Completes 5-hour coding assessments in 10 minutes (30x faster than human developers)
- Achieves over 100 tokens/second generation speed with MXFP4_MOE quantization
Why It Matters
Democratizes advanced coding assistance by running powerful AI agents on affordable hardware instead of expensive cloud APIs.