Open Source

Comparing Qwen3.5 vs Gemma4 for Local Agentic Coding

Independent benchmarks show Alibaba's Qwen3.5-27B produces cleaner code and fits better on consumer GPUs than Google's new Gemma4.

Deep Dive

Independent developer Aayush Garg published a head-to-head comparison of Alibaba's Qwen3.5 models and Google's newly released Gemma4 family for local agentic coding—where AI models autonomously execute multi-step coding workflows. The benchmarks, run on an RTX 4090 with 24GB of VRAM, combined standard speed tests with practical, single-shot coding challenges. The clear winner was Alibaba's Qwen3.5-27B, a dense model that produced the cleanest, most correct code with proper API usage, type hints, and docstrings, all while fitting comfortably within the GPU's memory constraints.

While Google's Gemma4-26B-A4B, a mixture-of-experts (MoE) model, achieved significantly faster generation speeds of ~135 tokens per second—about 3x faster than the dense Qwen3.5-27B—it faltered on reliability. Both MoE models (Gemma4-26B and Qwen3.5-35B) required retries on complex tasks, whereas the dense models succeeded on the first attempt. The analysis also revealed practical limitations: the Qwen3.5-35B was overly verbose, and the Gemma4-31B was severely context-limited, needing a reduction to 65K context to maintain performance on the test hardware.

The detailed findings underscore a critical trade-off in the local AI coding space: raw speed versus coding accuracy and reliability. For developers prioritizing correct, production-ready code generation on consumer-grade hardware, the Qwen3.5-27B emerges as the most efficient and dependable option, despite not being the fastest. This benchmark provides crucial, real-world data for engineers choosing a model to power local coding agents and autonomous workflows.

Key Points
  • Qwen3.5-27B produced the cleanest, most correct code with proper APIs and structure, winning the practical coding challenge.
  • Google's Gemma4 MoE models were ~3x faster (~135 tok/s vs ~45 tok/s) but less reliable, requiring retries on complex tasks.
  • On a 24GB RTX 4090, Qwen3.5-27B fits in 21GB VRAM, while Gemma4-31B's context had to be reduced to 65K to maintain speed.

Why It Matters

Provides concrete data for developers choosing a local AI coding assistant, highlighting the trade-off between generation speed and code reliability on consumer hardware.