Qwen3.6-35B-A3B-Uncensored-Wasserstein-GGUF
An independent researcher uses the Wasserstein metric to patch a critical numerical instability in quantized AI models.
An independent AI researcher, LuffyTheFox, has identified and patched a significant technical flaw in quantized versions of the popular Qwen3.6-35B-A3B-Uncensored language model. The bug, termed 'tensor drift,' specifically affected the `ssm_conv1d.weight` layers—recurrent state transition layers crucial for the model's long-context memory. This numerical instability, which also appeared in quantized models from Unsloth, was causing performance degradation. The researcher's key innovation was using the Wasserstein metric (W1), a statistical distance measure, to detect the drift more effectively than the traditional Kullback-Leibler divergence, leading to a precise correction.
The patched model, 'Qwen3.6-35B-A3B-Uncensored-Wasserstein-GGUF,' is now available on Hugging Face. It is based on an earlier aggressive quantization by HauhauCS. According to the creator, the fix results in a model that 'talks almost like a human,' remains fully uncensored, handles programming tasks well, and maintains character consistency in long-context roleplay scenarios. Recommended settings for optimal performance in tools like LM Studio include a Q4_K_P quantization, a temperature of 0.7, and specific penalty adjustments. This community-driven fix highlights the ongoing need for rigorous validation of quantized models, especially for complex architectures like State Space Models (SSMs).
- Researcher LuffyTheFox fixed 'tensor drift' in Qwen3.6-35B-A3B's SSM layers using the Wasserstein (W1) metric for detection.
- The bug affected `ssm_conv1d.weight` layers critical for long-context memory and was also found in Unsloth's quantized models.
- The patched GGUF model is available on Hugging Face and is recommended for uncensored chat, programming, and long-context roleplay.
Why It Matters
This fix ensures critical open-source models run stably after quantization, preserving their long-context reasoning capabilities for developers and researchers.