Open Source

Pushing a 5-Year-Old 6GB VRAM laptop to Its Limits: Qwen3.6-35B-A3B

r/LocalLLaMA May 04, 2026

⚡A 6GB RTX 2060 Max-Q laptop runs a 35B MoE model at 23 tokens per second.

Deep Dive

In a viral Reddit post, user abhinand05 documents how they pushed their 5-year-old ASUS ROG Zephyrus G14 (Ryzen 7 8C/16T, 24GB DDR4, RTX 2060 Max-Q 6GB) to run the Qwen3.6-35B-A3B model—a 35B-parameter Mixture-of-Experts architecture. Using llama-server with carefully tuned flags (CPU offloading, 36 CPU MoE threads, Q8_0 cache, 64K-128K context), they achieve ~23 tokens per second plugged in and 10+ t/s on battery, making the model genuinely usable for conversational AI. The setup also leverages Tom's fork for 128K context length, demonstrating that even budget hardware from 2020 can handle state-of-the-art open models.

The community response highlights a broader trend: open-weight models are becoming increasingly accessible. The user shared their full configuration and a blog post detailing the 'localmaxxing' journey, emphasizing how far open source has come. With proper quantization (GGUF) and aggressive CPU offloading, even a modest 6GB laptop can run a 35B model. This challenges the assumption that heavy AI inference requires expensive cloud GPUs, opening up local AI use for developers, students, and hobbyists with older equipment.

Key Points

Runs Qwen3.6-35B-A3B at 23 t/s on a 2020 laptop with 6GB VRAM and 24GB RAM
Uses llama-server with CPU offloading, Q8_0 cache, and 36 CPU MoE threads
Achieves 10+ t/s on battery and supports 128K context via Tom's fork

Why It Matters

Proves modern 35B open models can run locally on 5-year-old laptops, democratizing AI access.

Read Original Article

Pushing a 5-Year-Old 6GB VRAM laptop to Its Limits: Qwen3.6-35B-A3B

Why It Matters

Stay Ahead in AI