Reddit user finds 220W sweet spot for 4x RTX 3090 AI inference
Peak efficiency at 220W per GPU with Qwen 3.6-27B, minimal performance loss
A Reddit user (anitamaxwynnn69) conducted a rigorous power efficiency test on a 4x RTX 3090 setup running AI inference with vLLM v0.20.2 and the Qwen3.6-27B model at FP16 precision, using tensor parallelism (TP=4). The GPUs included Dell OEM, EVGA XC3, and two ASUS Strix cards, connected via PCIe Gen3 (bifurcated x16/x8/x8/x4). The user tested power limits from 200W to unrestricted (350-390W) and measured output tokens per second (t/s), prompt processing speed, and total throughput efficiency in tokens per joule.
Key results: At the unrestricted 350-390W, total throughput was 269 t/s with an efficiency of 0.77 t/joule. Dropping to 220W gave 248 t/s and peak efficiency of 1.13 t/joule — a ~46% efficiency gain with only an 8% throughput loss. Above 250W, returns diminished sharply. The sweet spot of 220W matches earlier blog findings (referenced in the post). The setup uses an open mining frame with 10x TL-C12C-S fans. The user is now exploring larger models like DeepSeek v4 at Q2, while noting that Qwen 3.6 27B remains a solid daily driver for multi-GPU hobbyists.
- Peak efficiency of 1.13 tokens/joule at 220W per GPU, vs 0.77 at unrestricted power
- Output only drops from 29 to 27 t/s when lowering power from 390W to 220W (less than 7% loss)
- Diminishing returns clearly observed: raising power above 250W provides negligible throughput gains
Why It Matters
Power optimization cuts electricity costs and heat for multi-GPU AI rigs, making high-end inference more accessible.