Open Source

Unsloth applies Multi-Token Prediction to Qwen 3.6 models, boosting speed

Two new GGUF quantized models – 27B dense and 35B MoE – get faster inference via MTP.

Deep Dive

Two new GGUF quantized models from Unsloth are available on Hugging Face: Qwen3.6-27B-GGUF-MTP and Qwen3.6-35B-A3B-GGUF-MTP, both supporting Multi-Token Prediction (MTP).

Key Points
  • Two new GGUF models: Qwen3.6-27B (dense) and Qwen3.6-35B A3B (MoE with 3B active params).
  • Uses Multi-Token Prediction (MTP) to predict multiple tokens at once, reducing latency by ~1.5–2x.
  • Quantized for low memory use – 35B variant fits on a 24GB GPU with 4-bit quantization.

Why It Matters

Brings 2x faster CPU/GPU inference to large open-source models, democratizing access to Qwen 3.6 performance.