Open Source

Qwen3.6 35B A3B uncensored heretic Native MTP Preserved is Out Now With KLD 0.0015, 10/100 Refusals and the Full 19 MTPs Preserved and Retained, Available in Safetensors, GGUFs. NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats

r/LocalLLaMA May 09, 2026

⚡New MoE model preserves 19 multi-token prediction heads, achieves near-zero refusals.

Deep Dive

Community developer llmfan46 released the Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved model, with all releases confirmed to retain full multi-token prediction (MTP) tensors. Available formats include Safetensors, GGUF, NVFP4-Experts-Only, NVFP4-Experts-Only-GGUF, and GPTQ-Int4. The model also comes with a benchmark. Note: In Safetensors, the MTP tensors appear as 19 entries due to a fused gate_up_proj; in GGUF, they appear as 20 entries because that tensor is split. The difference is format-based, but the MTP tensors are preserved.

Key Points

Retains all 19 Multi-Token Prediction tensors for up to 2-3x faster inference compared to standard single-token prediction.
Refusal rate of only 10 out of 100 tested prompts, making it one of the most permissive open-weights models available.
Available in 4 formats (Safetensors, GGUF, NVFP4, GPTQ-Int4) to run on consumer GPUs from 12GB to 48GB VRAM.

Why It Matters

Pushes local MoE inference limits with MTP preservation and uncensored outputs for advanced AI experiments.

Read Original Article

Qwen3.6 35B A3B uncensored heretic Native MTP Preserved is Out Now With KLD 0.0015, 10/100 Refusals and the Full 19 MTPs Preserved and Retained, Available in Safetensors, GGUFs. NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats

Why It Matters

Stay Ahead in AI