Open Source

Qwen3.5 122B in 72GB VRAM (3x3090) is the best model available at this time — also it nails the “car wash test”

r/LocalLLaMA February 26, 2026

⚡The 122-billion-parameter model runs at 25 tokens/sec on three consumer GPUs, outperforming larger models.

Deep Dive

A breakthrough in accessible high-performance AI has emerged from the open-source community, as users report successfully running Alibaba's massive Qwen3.5 122B language model on consumer-grade hardware. The 122-billion-parameter model, which notably aces the complex reasoning 'car wash test,' is being loaded entirely into 72GB of VRAM—typically three NVIDIA RTX 3090 GPUs—achieving a solid inference speed of 25 tokens per second. This development challenges the notion that state-of-the-art models are reserved for data centers, demonstrating that with clever quantization, they can run on enthusiast setups. The success highlights the rapid democratization of AI capabilities, where open-weight models are closing the gap with proprietary giants.

Technical optimization is key to this achievement. The user found that using the Q3_K quantization method allowed the model to fit within the GPU memory constraints while maintaining performance comparable to more memory-intensive 4-bit quantizations like MXFP4 and IQ4_XS. With settings like 'Thinking mode' enabled, a temperature of 0.6, and a 120k token context window, the model operates stably without the 'endless loop' issues seen in other configurations. While slightly slower than some alternatives like GPT-OSS-120B, its smaller memory footprint enables full GPU loading, avoiding the severe speed penalty of offloading layers to RAM. This practical guide provides a blueprint for others to run one of the most capable open models available today on affordable hardware.

Key Points

Runs fully on GPU with 72GB VRAM (3x RTX 3090) at 25 tokens/sec, avoiding slow RAM offload.
Uses Q3_K quantization to perform on par with 4-bit versions while allowing a 120k context window.
Excels at the 'car wash test,' a benchmark for complex, multi-step reasoning and instruction following.

Why It Matters

Democratizes access to frontier AI models, allowing developers and researchers to run 122B-parameter models on ~$3k worth of consumer GPUs.

Read Original Article

Qwen3.5 122B in 72GB VRAM (3x3090) is the best model available at this time — also it nails the “car wash test”

Why It Matters

Stay Ahead in AI