Adding an old RTX 2070 Super to a 5090 rig boosts local AI by 8GB VRAM
8GB extra VRAM from a dusty 2070 Super runs Qwen3.6-27B at 40-70 tk/s...
Deep Dive
A developer added an old GTX 2070 (8GB) to a high-end RTX 5090/9800X3D/96GB RAM system. The extra VRAM unlocks Qwen3.6-27B at Q8_0 with 144k context, generating 40–70 tokens/second via llama.cpp. The user now plans to buy a used 3090 for 24GB VRAM, concluding that bigger VRAM beats faster GPU for local AI workloads.
Key Points
- Added GTX 2070 Super (8GB) to RTX 5090/9800X3D rig – enables Qwen3.6-27B at Q8_0 with 144k context
- Generates 40–70 tokens/second via llama.cpp with MTP enabled
- User now prioritizes 3090 (24GB) over 5070 Ti – VRAM > raw speed for local AI
Why It Matters
Local AI users double down on VRAM: older high-VRAM GPUs often beat new low-VRAM ones.