Open Source

Qwen3.5-35B-A3B Q4 Quantization Comparison

r/LocalLLaMA February 27, 2026

⚡A new community benchmark reveals which Q4 quantization preserves the most model quality for Alibaba's 35B parameter model.

Deep Dive

A comprehensive community analysis has benchmarked various Q4 quantization methods for Alibaba's Qwen3.5-35B-A3B model, providing developers with concrete data to choose the optimal compressed version for local deployment. The benchmark, which compared files from quantizers like AesSedai, bartowski, and Unsloth, focused on two key metrics: KL Divergence (KLD), which measures how much the quantized model's probability distribution drifts from the original BF16 baseline, and Perplexity (PPL), which gauges the model's prediction confidence. The goal was to move beyond guesswork and offer a performance-based selection guide for a model that balances strong capabilities with manageable resource requirements.

The results identified clear leaders: AesSedai's Q4_K_M quantization achieved the lowest KLD score of 0.0102 by strategically protecting critical tensors like attention weights. For the best efficiency—a balance of small file size and preserved accuracy—the IQ4_XS variants, particularly from AesSedai, led with an efficiency score of 0.327. The analysis also noted that Unsloth's UD-Q4_K_XL recipe, while compact, currently has the highest KLD (0.0524), though the team is actively working on improvements. This benchmark empowers developers to select quantizations that minimize the 'information loss' from compression, ensuring the locally run model stays as true as possible to its original, more resource-intensive version.

Key Points

AesSedai's Q4_K_M quantization for Qwen3.5-35B-A3B achieved the best faithfulness score with a KLD of 0.0102.
The IQ4_XS variant offers the best efficiency score (0.327), balancing a 16.4 GB size with solid performance.
The benchmark provides a crucial data-driven guide for developers choosing between file size and model quality for local AI.

Why It Matters

Enables developers to deploy powerful 35B-parameter models locally with confidence, choosing the optimal balance of speed, size, and accuracy.

Read Original Article

Qwen3.5-35B-A3B Q4 Quantization Comparison

Why It Matters

Stay Ahead in AI