Qwen3.6 35B A3B uncensored heretic Native MTP Preserved is Out Now With KLD 0.0015, 10/100 Refusals and the Full 19 MTPs Preserved and Retained, Available in Safetensors, GGUFs. NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats
New MoE model preserves 19 multi-token prediction heads, achieves near-zero refusals.
Community developer llmfan46 released the Qwen3.6-35B-A3B-uncensored-heretic-Native-MTP-Preserved model, with all releases confirmed to retain full multi-token prediction (MTP) tensors. Available formats include Safetensors, GGUF, NVFP4-Experts-Only, NVFP4-Experts-Only-GGUF, and GPTQ-Int4. The model also comes with a benchmark. Note: In Safetensors, the MTP tensors appear as 19 entries due to a fused gate_up_proj; in GGUF, they appear as 20 entries because that tensor is split. The difference is format-based, but the MTP tensors are preserved.
- Retains all 19 Multi-Token Prediction tensors for up to 2-3x faster inference compared to standard single-token prediction.
- Refusal rate of only 10 out of 100 tested prompts, making it one of the most permissive open-weights models available.
- Available in 4 formats (Safetensors, GGUF, NVFP4, GPTQ-Int4) to run on consumer GPUs from 12GB to 48GB VRAM.
Why It Matters
Pushes local MoE inference limits with MTP preservation and uncensored outputs for advanced AI experiments.