Open Source

Qwen3.5-35B-A3B is a gamechanger for agentic coding.

A new open-source AI model completes a 5-hour coding test in 10 minutes using just 22GB of VRAM.

Deep Dive

Alibaba's Qwen research team has released Qwen3.5-35B-A3B, an open-weights AI model that's demonstrating remarkable coding capabilities on consumer hardware. Independent testing shows the 35-billion parameter model running efficiently on a single RTX 3090 GPU using Llama.cpp, consuming only 22GB of VRAM while achieving over 100 tokens/second generation speed. Most impressively, the model completed a comprehensive mid-level mobile developer coding test—designed to take human developers approximately 5 hours—in just 10 minutes, representing a 30x speed improvement. This performance suggests open-source models are reaching new thresholds for practical, agentic coding assistance.

The technical achievement lies in the model's efficiency and quality combination. Using MXFP4_MOE quantization via the GGUF format, Qwen3.5-35B-A3B maintains high reasoning quality while being accessible to developers with consumer-grade hardware. In another test, it recreated a complex dashboard interface (similar to OpenAI's Cursor demo) in approximately 5 minutes, a task that previously required Claude Code. This positions Qwen3.5 as the first open-weights model capable of matching proprietary agentic coding tools like early versions of Anthropic's Sonnet and Kodu.AI on accessible hardware, potentially democratizing advanced coding assistance beyond cloud API dependencies.

Key Points
  • Runs on consumer hardware: Single RTX 3090 GPU with 22GB VRAM via Llama.cpp
  • Completes 5-hour coding assessments in 10 minutes (30x faster than human developers)
  • Achieves over 100 tokens/second generation speed with MXFP4_MOE quantization

Why It Matters

Democratizes advanced coding assistance by running powerful AI agents on affordable hardware instead of expensive cloud APIs.