Open Source

Got MTP + TurboQuant running — Qwen3.6-27B -- 80+ t/s at 262K context on a single RTX 4090

r/LocalLLaMA May 09, 2026

⚡Optimized performance hits 87 t/s using advanced KV cache techniques.

Deep Dive

According to the article, Indrasmirror optimized Qwen3.6-27B on an RTX 4090 with 24GB VRAM, achieving 80–87 t/s after tuning, up from 43 t/s. Using TBQ4_0 (TurboQuant's lossless 4.25 bpv KV cache), a 262K context, and MTP draft acceptance at 73% (with draft 3), the author reports solid output quality. The fork is buildable at the provided GitHub link, with technical details on the kernel architecture in the blog post.

Key Points

Achieved 80-87 t/s performance on RTX 4090, a significant boost from 43 t/s.
Utilized TurboQuant's lossless KV cache and MTP, with a 262K context.
Draft acceptance rate for MTP stands at 73%, ensuring high output quality.

Why It Matters

This advancement enhances AI model performance, benefiting developers and researchers alike.

Read Original Article

Got MTP + TurboQuant running — Qwen3.6-27B -- 80+ t/s at 262K context on a single RTX 4090

Why It Matters

Stay Ahead in AI