48GB VRAM May Not Cut It for Qwen 3.6 27B at Q8 with 262K Context
Uncompressed KV cache at 262K context on a 27B model demands VRAM far beyond 48GB.
Deep Dive
A Reddit user asks whether 48GB VRAM is enough to run a model at Q8 quantization with uncompressed KV cache and 262K context length, without naming a specific model. They currently run IQ4XS and Q4 KV compression and are seeking GPU upgrade advice. The article does not provide VRAM estimates or a definitive answer.
Key Points
- Qwen 3.6 27B at Q8 requires ~27GB for weights; uncompressed KV for 262K tokens adds 50–100GB depending on model architecture.
- Current setup uses IQ4XS (4-bit weights) and Q4 KV cache to fit within limited VRAM; target upgrade would roughly double VRAM needs.
- 48GB is likely insufficient for uncompressed KV at 262K context; users may need 80GB+ (e.g., dual 4090s or an A100) or adopt smarter caching strategies.
Why It Matters
VRAM budgeting for long-context local LLMs remains a key bottleneck, influencing GPU purchase decisions and model quantization choices.