Dual RTX 3090 (48GB VRAM) runs Qwen 3.6 27B at 113 tk/s and 4000 pp/s on Ubuntu?

Dual RTX 3090 (48GB VRAM) runs Qwen 3.6 27B at 113 tk/s and 4000 pp/s on Ubuntu

Switching from WSL2 to native Linux increased prompt processing by 10x and generation by 3.7x?

Switching from WSL2 to native Linux increased prompt processing by 10x and generation by 3.7x

User calls local inference 'almost-sonnet level' for code review and agentic tasks like SSH management?

User calls local inference 'almost-sonnet level' for code review and agentic tasks like SSH management

Open Source

Dual RTX 3090 setup hits 113 tk/s with Qwen 3.6 27B locally

r/LocalLLaMA May 14, 2026

⚡Ubuntu dual boot unlocks 4000 pp/s on 48GB VRAM—beats cloud inference speed.

Deep Dive

A Reddit user has demonstrated that dual RTX 3090s (48GB VRAM) can run Qwen 3.6 27B with 262K context at impressive speeds using the open-source 'club-3090' project. Originally running under WSL2, the user saw only 30 tk/s and 400 pp/s—but a switch to native Ubuntu dual boot unlocked 4000 tokens/s prompt processing and 113 tk/s generation, all without NVLink. The configuration required patches from Claude Sonnet to fix SSE-session drop and tool-calling bugs, but the result is local inference that feels 'almost-sonnet level' for code review and monkey patching.

The user believes this signals a viable path for budget local AI setups (≈$3K for two used 3090s). They note that current small models (27B) already outperform cloud APIs in speed while rivaling frontier intelligence in domain-specific tasks like SSH session management and code reviews. Speculation about next upgrades (M5 Ultra, DGX Sparks) suggests the community anticipates frontier-class small models within the next 12 months, further democratizing high-performance local AI.

Key Points

Dual RTX 3090 (48GB VRAM) runs Qwen 3.6 27B at 113 tk/s and 4000 pp/s on Ubuntu
Switching from WSL2 to native Linux increased prompt processing by 10x and generation by 3.7x
User calls local inference 'almost-sonnet level' for code review and agentic tasks like SSH management

Why It Matters

Shows that used dual 3090s can rival cloud AI speeds locally, enabling private, cost-effective coding assistants.

Read Original Article

Dual RTX 3090 setup hits 113 tk/s with Qwen 3.6 27B locally

Why It Matters

Related Articles

🚀 Stay Ahead in AI