Qwen3.5-35B-A3B-Uncensored-Claude-Opus-4.6-Affine
A new merged model runs at 17-18 tokens/sec on a 12GB RTX 3060, blending reasoning skills from a Claude Opus distillation.
Independent developer LuffyTheFox has released a new, highly capable local AI model by merging two existing variants of Alibaba's Qwen 3.5 35B architecture. The model, named 'Qwen3.5-35B-A3B-Uncensored-Claude-Opus-4.6-Affine,' combines the aggressive, uncensored nature of HauhauCS's version with the advanced reasoning skills distilled from Anthropic's Claude Opus 4.6 in Jackrong's model. Crucially, the merge was performed entirely within Google Colab's free tier while the model remained in its compressed IQ4_XS format, a technical feat that demonstrates advanced model surgery techniques.
The result is a model that leverages only 3 active billion parameters (A3B), allowing it to run efficiently on consumer hardware like an NVIDIA RTX 3060 with 12GB of VRAM, where it generates 17-18 tokens per second. The developer applied a specialized script to transfer the 'thinking skills' from the Claude-distilled model and used KL divergence, a mathematical method, to clean up inconsistencies, particularly in sensitive layers like attention mechanisms. The model excels at programming tasks—demonstrated by creating a Tron: Legacy-styled Arkanoid game—and natural communication, all without built-in content filters, representing a significant step in democratizing powerful, uncensored AI for local deployment.
- Merges two Qwen 3.5 35B models to combine uncensored output with Claude Opus 4.6 reasoning skills.
- Runs at 17-18 tokens/sec on a consumer RTX 3060 12GB GPU due to its 3 active billion parameter (A3B) design.
- Refined using KL divergence in Google Colab free tier without decompressing the IQ4_XS format, showcasing advanced model editing.
Why It Matters
It makes powerful, uncensored AI with advanced reasoning accessible to developers and hobbyists with modest consumer-grade hardware.