Open Source

smol-IQ2_XS 113.41 GiB (2.46 BPW)

r/LocalLLaMA February 17, 2026

⚡A breakthrough quantization recipe squeezes a massive 397B parameter model into consumer hardware.

Deep Dive

A developer has released a custom quantization recipe for the massive Qwen3.5-397B-A17B model, creating a 113.41 GiB version that fits within 128GB of VRAM. This 'smol-IQ2_XS' quant uses a full Q8_0 precision for attention layers, aiming for the best possible performance in its size class. While awaiting more advanced 'ik_llama.cpp' quant types, this makes one of the world's largest open models accessible on high-end consumer GPUs like the RTX 4090.

Why It Matters

This democratizes access to frontier-scale AI models, allowing researchers and developers to run them on affordable, single-GPU systems.

Read Original Article

smol-IQ2_XS 113.41 GiB (2.46 BPW)

Why It Matters

Stay Ahead in AI