Open Source

Qwen3.5 27B vs 35B Unsloth quants - LiveCodeBench Evaluation Results

r/LocalLLaMA March 07, 2026

⚡Smaller quantized model outperforms larger counterpart, scoring 34.8% vs 11.0% on LiveCodeBench tests.

Deep Dive

A comprehensive benchmark evaluation of Qwen's latest models reveals surprising performance dynamics in code generation tasks. Using LiveCodeBench on an RTX 4060 Ti setup, testers compared Qwen3.5-27B-UD-IQ3_XXS (10.7GB) against Qwen3.5-35B-A3B-IQ4_XS (17.4GB) across multiple difficulty levels and time periods. The results show the smaller 27B model consistently outperformed its larger counterpart, achieving 34.8% overall accuracy versus just 11.0% for the 35B model. Most strikingly, the 27B model showed 25.0% accuracy on medium-difficulty problems compared to 4.2% for the 35B model—a 6x performance gap.

Technical analysis reveals the 35B model's performance degraded significantly on newer problems from April-May 2025, scoring 0% compared to the 27B model's 25.0%. Even the 9B-Q6 model (8.15GB) outperformed the 35B model on these recent problems with 16.7% accuracy. The 35B model's poor performance persisted despite attempts to improve it through different quantization methods (Q5_K_XL), increased context length (150k), and disabling thinking mode. These findings suggest that quantization strategy and model architecture may matter more than raw parameter count for specific tasks like code generation, challenging conventional wisdom in model selection.

Key Points

Qwen3.5-27B-UD-IQ3_XXS (10.7GB) scored 34.8% overall accuracy vs 11.0% for Qwen3.5-35B-A3B-IQ4_XS (17.4GB) on LiveCodeBench
The 27B model showed 6x better performance on medium-difficulty problems (25.0% vs 4.2%) despite using lower quantization
35B model scored 0% on April-May 2025 problems while 27B maintained 25.0% accuracy, showing significant degradation on newer tasks

Why It Matters

Developers can achieve better code generation results with smaller, efficiently quantized models, challenging the 'bigger is always better' assumption in AI deployment.

Read Original Article

Qwen3.5 27B vs 35B Unsloth quants - LiveCodeBench Evaluation Results

Why It Matters

Stay Ahead in AI