Open Source

Switched from Qwen3.6 35b-a3b to Qwen3.6 27b mid coding and it's noticeably better!

Dense 27B beats MoE 35B with IQ3_M compression at 16GB VRAM

Deep Dive

A developer building an HTML tower defense game with waypoint path logic on a 32GB RAM, 16GB VRAM RTX 5070 Ti system switched from Qwen3.6 35B-A3B (MoE) to Qwen3.6 27B (dense) mid-project and reported the 27B model performed noticeably better. Using IQ4_XS on the 35B-A3B and IQ3_M on the 27B, the dense model not only handled the compressed format well but also fixed a difficult bug the larger MoE model couldn't solve. The user notes that dense models tend to handle compression better than mixture-of-experts architectures, which may explain the quality difference.

Performance-wise, both models delivered 40-50 tokens per second on the 5070 Ti, but the 27B had more consistent speed throughout, while the 35B-A3B suffered from slow prompt processing. The developer completed the game—a self-contained single HTML file called Waypoint Tower Defense, playable on htmlbin—and recommends others with 16GB VRAM try IQ3_M versions of 27B models for actual work. The experience highlights that model architecture (dense vs MoE) matters more than raw parameter count at local inference scales.

Key Points
  • Qwen3.6 27B (dense) at IQ3_M outperformed Qwen3.6 35B-A3B (MoE) at IQ4_XS on a coding task
  • Dense models handle compression better than MoE, maintaining quality at lower bitrates
  • Both models ran at 40-50 tokens/sec on 16GB VRAM, but 27B had faster prompt processing

Why It Matters

Local AI developers can get better coding performance from smaller dense models than larger MoE ones at the same VRAM.