Open Source

Train MoE models 12x faster with 30% less memory! (<15GB VRAM)

r/LocalLLaMA February 10, 2026

⚡This breakthrough could make massive MoE models accessible on consumer GPUs.

Deep Dive

Unsloth AI has released custom Triton kernels that reportedly enable 12x faster training and over 35% less VRAM usage for Mixture of Experts (MoE) models with no accuracy loss. The optimizations support models like Qwen3-30B and GPT-OSS-20B, which can now fine-tune in just 12.8GB of VRAM. The efficiency scales with model size, and the kernels work on data-center and consumer GPUs like the RTX 3090.

Why It Matters

This dramatically lowers the cost and hardware barrier for developers and researchers to experiment with and deploy state-of-the-art MoE architectures.

Read Original Article

Train MoE models 12x faster with 30% less memory! (<15GB VRAM)

Why It Matters

Stay Ahead in AI