Open Source

Qwen3.5-35B-A3B-Uncensored-FernflowerAI-GGUF

A single training bug in two tensors was crippling one of 2024's most advanced open-source AI models.

Deep Dive

A developer known as LuffyTheFox (FernflowerAI) has diagnosed and repaired a critical flaw in the highly anticipated Qwen3.5-35B-A3B-Uncensored model, a sophisticated open-weight AI. The model, known for its uncensored outputs and hybrid architecture combining Mixture of Experts (MoE) with DeltaNet recurrent layers, was suffering from a bizarre failure mode: it would perform well on short prompts but rapidly degrade in longer conversations, producing nonsensical 'philosophizing' and broken code. After two weeks of forensic analysis, the root cause was traced to just two specific tensors in the model's 36th and 37th layers. Due to an anomaly in the AdamW optimization process, these tensors had drifted to a scale approximately 60% higher than normal, catastrophically corrupting the hidden state in the recurrent DeltaNet layers and causing the model to forget context almost immediately.

The fix was surgically precise: scaling only the two errant tensors back to their expected values, leaving the other 489 tensors untouched. This single adjustment resulted in an 88.6% reduction in the model's error rate, transforming its performance. Long conversations now remain coherent, and code generation functions correctly. The developer has released the corrected model weights on HuggingFace alongside an optimized system prompt and chat template that supports tool calling. This incident serves as a critical case study for the AI community, highlighting how subtle training artifacts can cripple even the most advanced model architectures and underscoring the importance of rigorous weight inspection, especially for complex hybrid designs like MoE + recurrent networks.

Key Points
  • A training bug in just 2 of 491 tensors caused an 88.6% error rate in the Qwen3.5-35B-A3B model's long-context performance.
  • The bug was specific to the AdamW optimizer's interaction with rare experts in the final layers of the hybrid MoE + DeltaNet architecture.
  • The surgical fix fully restored the model's capability for coherent long conversations and code generation, unlocking its intended potential.

Why It Matters

This demonstrates how minor training flaws can cripple cutting-edge AI, emphasizing the need for meticulous model validation and open-source debugging.