Retains all 785 MTPs (Multi-Token Prediction heads) for speculative decoding, available in Safetensors, GGUF, NVFP4, and GPTQ-Int4 formats?

Retains all 785 MTPs (Multi-Token Prediction heads) for speculative decoding, available in Safetensors, GGUF, NVFP4, and GPTQ-Int4 formats.

KL divergence of 0.0487 with only 0.40% accuracy loss, demonstrating low impact from abliteration compared to Qwen3.6's higher sensitivity?

KL divergence of 0.0487 with only 0.40% accuracy loss, demonstrating low impact from abliteration compared to Qwen3.6's higher sensitivity.

Optimized for general-purpose AI assistance, complementing Qwen3.6's focus on agentic and coding use cases?

Optimized for general-purpose AI assistance, complementing Qwen3.6's focus on agentic and coding use cases.

Open Source

Qwen3.5 35B A3B uncensored model drops with full MTP preservation

r/LocalLLaMA May 26, 2026

⚡Uncensored Qwen3.5 with 785 MTPs retained launches in 5 formats on HuggingFace.

Deep Dive

LLMFan46 has released Qwen3.5-35B-A3B-uncensored-heretic-v2 with Native MTP Preserved, available in Safetensors, GGUF, NVFP4, and GPTQ-Int4 formats. The model shows a KL divergence of 0.0487 with an accuracy loss of 0.40%. It is intended for general-purpose AI assistance, while Qwen3.6 models are mainly for agentic and coding tasks. A benchmark is also included.

Key Points

Retains all 785 MTPs (Multi-Token Prediction heads) for speculative decoding, available in Safetensors, GGUF, NVFP4, and GPTQ-Int4 formats.
KL divergence of 0.0487 with only 0.40% accuracy loss, demonstrating low impact from abliteration compared to Qwen3.6's higher sensitivity.
Optimized for general-purpose AI assistance, complementing Qwen3.6's focus on agentic and coding use cases.

Why It Matters

Provides a high-quality uncensored general-purpose model with minimal performance loss, ideal for local AI experimentation and deployment.

Read Original Article

Qwen3.5 35B A3B uncensored model drops with full MTP preservation

Why It Matters

Related Articles

🚀 Stay Ahead in AI