Open Source

Dual RTX 3090 build powers local LLM inference for agentic work

Two 3090s running Qwen 3.6 27B with nginx and VSCode...

Deep Dive

A Reddit user showcased a dual RTX 3090 build dedicated to local LLM inference, rekindling their interest in software engineering. The system runs Qwen 3.6 27B (likely a variant of Qwen 2.5 or a custom model) using VSCode as the frontend and an nginx server for API exposure. The user explicitly seeks to make the setup work in a professional environment, focusing on agentic workflows (autonomous AI agents) and RAG (retrieval-augmented generation) pipelines for better codebase understanding. They also ask about MCP servers and custom tools to enhance usability.

The post highlights a broader trend: hobbyists and professionals are increasingly building high-end local rigs to avoid rising cloud API costs. With dual 3090s providing around 48GB of VRAM (24GB per card, but limited by NVLink constraints), such setups can run 27B-class models at reasonable speeds. The user's interest in agentic work and custom integrations mirrors the industry shift toward self-hosted, specialized AI solutions. While not replacing ChatGPT overnight, this build demonstrates that local inference is becoming viable for development and research tasks, especially when privacy and cost control matter.

Key Points
  • Dual RTX 3090 system provides ~48GB VRAM for running 27B parameter models locally
  • User runs Qwen 3.6 27B via VSCode preview with nginx for API access
  • Focus on agentic workflows and RAG pipelines for codebase understanding

Why It Matters

Local AI inference on consumer hardware is becoming viable, reducing dependency on expensive cloud APIs.