Open Source

Unsloth releases quantized Gemma 4 QAT MTP models for local AI

Google's Gemma 4 now runs locally with QAT optimization and MTP support

Deep Dive

Unsloth, known for its efficient fine-tuning and quantization tools, has just dropped quantized GGUF versions of Google's Gemma 4 family with QAT and MTP support. The models — named mtp-gemma-4-*.gguf — span sizes from 12B to 31B, including mixture-of-experts variants like the 26B-A4B (26 billion parameters, 4 active experts) and the compact E2B and E4B. All are available in q8_0 (8-bit dynamic quantization) and larger quantization levels, stored in an MTP folder on Hugging Face.

For developers and local AI enthusiasts, this means running Gemma 4 with multi-turn prompt capabilities on consumer hardware, with mobile-optimized versions for on-device deployment. QAT ensures the models retain accuracy even after aggressive quantization. The release lowers the barrier to running Google's latest open-weight models locally, especially for edge devices and applications requiring multiple conversation turns.

Key Points
  • Unsloth released GGUF quantized versions of Gemma 4 (12B to 31B) with QAT and MTP features
  • Models available in q8_0 and larger quants, with mobile variants for E2B and E4B
  • Optimized for local inference; MTP (multi-turn prompt) improves conversational AI performance

Why It Matters

Gemma 4 QAT MTP models enable high-quality local AI with multi-turn conversations on consumer and mobile hardware