My NAS runs an 80B LLM at 18 tok/s on its iGPU. No discrete GPU. Still optimizing.
This home-built NAS setup crushes local AI inference without expensive graphics cards...
Deep Dive
A hobbyist achieved 18 tokens/second inference on an 80-billion parameter Qwen3-Coder-Next model using only integrated graphics on a custom NAS. The system combines a Ryzen AI 9 HX PRO CPU with 96GB RAM, running llama.cpp with Vulkan backend and flash attention. After optimizing from 3 tok/s (CPU-only) to 18 tok/s, it demonstrates that high-end local LLMs can run on surprisingly affordable, multi-purpose hardware without discrete GPUs.
Why It Matters
This dramatically lowers the cost barrier for running massive local AI models, making advanced inference accessible without $1,000+ GPUs.