Open Source

6-GPU local LLM workstation (≈200GB+ VRAM) – looking for scaling / orchestration advice

This monster rig runs three reasoning models at once—here's what he learned.

Deep Dive

A developer has successfully built a massive local AI workstation featuring six GPUs with over 200GB of aggregate VRAM on a Threadripper PRO platform. The system can concurrently run three open-source reasoning models for internal data analysis and workflow automation. The builder is now seeking community advice on scaling, orchestration, and identifying potential bottlenecks like PCIe bandwidth or CPU overhead in multi-GPU inference setups at this extreme scale.

Why It Matters

It showcases the bleeding edge of private, high-performance AI infrastructure, pushing the limits of what's possible locally.