Unsloth released GGUF quantized versions of Gemma 4 (12B to 31B) with QAT and MTP features?

Unsloth released GGUF quantized versions of Gemma 4 (12B to 31B) with QAT and MTP features

Models available in q8_0 and larger quants, with mobile variants for E2B and E4B?

Models available in q8_0 and larger quants, with mobile variants for E2B and E4B

Optimized for local inference; MTP (multi-turn prompt) improves conversational AI performance?

Optimized for local inference; MTP (multi-turn prompt) improves conversational AI performance

Open Source

Unsloth releases quantized Gemma 4 QAT MTP models for local AI

r/LocalLLaMA June 10, 2026

⚡Google's Gemma 4 now runs locally with QAT optimization and MTP support

Deep Dive

Unsloth, known for its efficient fine-tuning and quantization tools, has just dropped quantized GGUF versions of Google's Gemma 4 family with QAT and MTP support. The models — named mtp-gemma-4-*.gguf — span sizes from 12B to 31B, including mixture-of-experts variants like the 26B-A4B (26 billion parameters, 4 active experts) and the compact E2B and E4B. All are available in q8_0 (8-bit dynamic quantization) and larger quantization levels, stored in an MTP folder on Hugging Face.

For developers and local AI enthusiasts, this means running Gemma 4 with multi-turn prompt capabilities on consumer hardware, with mobile-optimized versions for on-device deployment. QAT ensures the models retain accuracy even after aggressive quantization. The release lowers the barrier to running Google's latest open-weight models locally, especially for edge devices and applications requiring multiple conversation turns.

Key Points

Unsloth released GGUF quantized versions of Gemma 4 (12B to 31B) with QAT and MTP features
Models available in q8_0 and larger quants, with mobile variants for E2B and E4B
Optimized for local inference; MTP (multi-turn prompt) improves conversational AI performance

Why It Matters

Gemma 4 QAT MTP models enable high-quality local AI with multi-turn conversations on consumer and mobile hardware

Read Original Article

Unsloth releases quantized Gemma 4 QAT MTP models for local AI

Why It Matters

Related Articles

Stay Ahead in AI