LLMFan46 drops Qwen3.5-27B uncensored with full MTP preservation and multiple formats
New uncensored model retains all 15 MTPs while showing just 0.35% accuracy loss
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
LLMFan46 has released Qwen3.5-27B-uncensored-heretic-v2, a new 27B-parameter model that preserves all 15 Multi-Token Prediction (MTP) heads. The model is available in five formats: Safetensors, GGUF, NVFP4, NVFP4 GGUF, and GPTQ-Int4, catering to different hardware setups from consumer GPUs to quantized edge devices. According to the developer, this release fills a niche for general-purpose AI assistance, contrasting with the Qwen3.6 series (also based on the qwen35 architecture), which is optimized for agentic and coding tasks. While both architectures share the same base, they respond differently to abliteration and fine-tuning.
The model’s key technical highlight is its robustness: despite a KL divergence of 0.0308 (high by standard metrics), it suffers only a 0.35% accuracy loss on benchmarks. This is significantly better than the Qwen3.6 27B variant, which lost 0.98% accuracy with a much lower KL divergence of 0.0021. LLMFan46 explains that Qwen3.5 models can tolerate high KL divergence without degrading benchmark performance, making them ideal for uncensored releases where users want maximum creative freedom. The model includes benchmarks on its HuggingFace page. This release positions Qwen3.5-27B as a strong option for professionals needing uncensored capabilities without sacrificing reliability.
- Available in 5 formats: Safetensors, GGUF, NVFP4, NVFP4 GGUF, and GPTQ-Int4 on HuggingFace.
- Optimized for general-purpose AI assistance, while Qwen3.6 is tuned for agentic/coding tasks.
- Shows only 0.35% accuracy loss despite a KL divergence of 0.0308, demonstrating high robustness.
Why It Matters
Provides a reliable uncensored 27B model for local deployment with minimal accuracy trade-offs across multiple quantization formats.