Open Source

Evaluating Qwen3.5-35B & 122B on Strix Halo: Bartowski vs. Unsloth UD-XL Performance and Logic Stability

r/LocalLLaMA March 10, 2026

⚡New 'dynamic' quantization method from Unsloth fails to match standard quants, showing 58% higher token usage and logic instability.

Deep Dive

Independent benchmark testing has revealed significant performance and logic issues with Unsloth's newly released 'dynamic' quantization method for large language models. The tests, conducted on a Strix Halo platform using llama.cpp builds b8204 and b8248, compared Unsloth's UD-XL quants for Qwen3.5-35B and 122B models against standard Bartowski quantization. The results show the dynamic quants are not only slower as advertised but also exhibit concerning reasoning instability and higher token consumption.

In a practical coding test to generate a 3D animated solar system in HTML, Unsloth's 122B-A10B-UD-Q5_K_XL quant required 29,521 tokens and multiple attempts, while the standard Bartowski 122B-A10B-Q5_K_L quant completed the task in one pass using only 18,700 tokens—a 58% efficiency difference. More troubling was the dynamic quant's odd reasoning behavior, where it prefaced its response with unnecessary internal monologue and struggled to maintain logical consistency within its 100K token context window, offering 'weird solutions' that standard quants avoided.

The technical analysis shows that while recent llama.cpp optimizations (particularly for ROCm) provide some speed improvements, the dynamic quants still underperform. The tests were conducted on Debian Linux with RADV Mesa 26.0.0-1 and ROCm nightly 7.12.0a20260307, ensuring controlled comparison conditions. These findings suggest users should approach Unsloth's dynamic quantization with caution until further validation and improvements are made.

Key Points

Unsloth's 122B dynamic quant used 58% more tokens (29,521 vs 18,700) than standard Bartowski quant for same HTML task
Dynamic quants exhibited odd reasoning behavior including unnecessary 'Thinking:' monologues and logical inconsistencies
New llama.cpp optimizations show ROCm speed improvements but dynamic quants remain slower as advertised

Why It Matters

Quantization choices directly impact AI deployment costs and reliability—faulty methods waste compute resources and produce unstable outputs.

Read Original Article

Evaluating Qwen3.5-35B & 122B on Strix Halo: Bartowski vs. Unsloth UD-XL Performance and Logic Stability

Why It Matters

Stay Ahead in AI