Dual RTX 3090 build powers local LLM inference for agentic work
Two 3090s running Qwen 3.6 27B with nginx and VSCode...
A Reddit user showcased a dual RTX 3090 build dedicated to local LLM inference, rekindling their interest in software engineering. The system runs Qwen 3.6 27B (likely a variant of Qwen 2.5 or a custom model) using VSCode as the frontend and an nginx server for API exposure. The user explicitly seeks to make the setup work in a professional environment, focusing on agentic workflows (autonomous AI agents) and RAG (retrieval-augmented generation) pipelines for better codebase understanding. They also ask about MCP servers and custom tools to enhance usability.
The post highlights a broader trend: hobbyists and professionals are increasingly building high-end local rigs to avoid rising cloud API costs. With dual 3090s providing around 48GB of VRAM (24GB per card, but limited by NVLink constraints), such setups can run 27B-class models at reasonable speeds. The user's interest in agentic work and custom integrations mirrors the industry shift toward self-hosted, specialized AI solutions. While not replacing ChatGPT overnight, this build demonstrates that local inference is becoming viable for development and research tasks, especially when privacy and cost control matter.
- Dual RTX 3090 system provides ~48GB VRAM for running 27B parameter models locally
- User runs Qwen 3.6 27B via VSCode preview with nginx for API access
- Focus on agentic workflows and RAG pipelines for codebase understanding
Why It Matters
Local AI inference on consumer hardware is becoming viable, reducing dependency on expensive cloud APIs.