Open Source

Qwen3.5-9B Quantization Comparison

r/LocalLLaMA March 12, 2026

⚡A comprehensive sweep of 46 GGUF quants identifies the most faithful compressed versions of the 9B model.

Deep Dive

A detailed community analysis has performed a rigorous quantization sweep of Alibaba's Qwen3.5-9B model, comparing 46 different GGUF files to establish a data-driven selection guide. The evaluation uses KL Divergence (KLD) to measure how much a quantized model's probability distribution drifts from the original BF16 baseline, with lower scores indicating greater faithfulness. Perplexity (PPL) measures total prediction error. The results reveal significant performance differences between quantizers like bartowski, unsloth, and lmstudio, even for files with the same nominal quantization level (e.g., Q4_K_M).

For users constrained by VRAM, bartowski's IQ4_XS quant (4.93 GiB, KLD 0.0127) is the top recommendation, offering the best performance without dropping below Q4 precision. The analysis shows that Q2 and IQ2 quants are measurably worse, and repetition loops observed in text generation align with higher KLD scores. The full ranking, visualized in a Hugging Face Space, allows developers to pick the optimal file for their specific hardware and accuracy needs, moving beyond guesswork to informed selection.

Key Points

bartowski's IQ4_XS quant (4.93 GiB) is the best option for VRAM-limited users, with a KLD of 0.0127.
Significant variance exists between quantizers; bartowski's Q4_K_M (KLD 0.0087) vastly outperforms lmstudio's version (KLD 0.0353).
The Q8_0 quantization offers the highest fidelity to the original model, with the lowest recorded KLD of 0.000814.

Why It Matters

Provides developers with empirical data to choose the right compressed model, optimizing local AI performance versus hardware cost.

Read Original Article

Qwen3.5-9B Quantization Comparison

Why It Matters

Stay Ahead in AI