Open Source

Is anyone getting real coding work done with Qwen3.6-35B-A3B-UD-Q4_K_M on a 32GB Mac in opencode, claude code or similar?

Developer tests Qwen3.6-35B for real coding work, finds 32K token limit cripples complex bug-fixing tasks.

Deep Dive

A developer's viral experiment with Alibaba's Qwen3.6-35B-A3B-UD-Q4_K_M model reveals the harsh reality of running high-performance coding AIs locally. On a 32GB M2 MacBook Pro using OpenCode and llama.cpp, memory constraints forced a drastic reduction of the model's context window from its native 262,144 tokens down to just 32,768. This severely limited the AI's ability to handle complex, multi-file coding tasks, such as debugging a full-stack application where the developer themselves couldn't initially spot the issue.

During testing, the Qwen3.6-35B model successfully identified the core bug—demonstrating its analytical capability—but consistently failed during the implementation phase. The critical failure occurred during context "compaction," where the system discards old information to manage memory. After the second compaction pass, the model lost crucial task details, even misremembering directory names, rendering it useless for completion. The developer's configuration, including a quantized GGUF model and aggressive GPU offloading (-ngl 99), still couldn't overcome the 32GB RAM barrier for this 35-billion-parameter model.

The experiment underscores a key industry tension: while models like Qwen3.6 are architected for extended context (officially recommending at least 128K tokens for complex tasks), consumer hardware like Apple's 32GB Macs can't provide the necessary memory bandwidth. This creates a "must be this tall to ride" scenario, where the model's advanced capabilities are gated by expensive hardware, limiting its practical utility for professional developers seeking a local, private coding assistant alternative to cloud services like Claude Code.

Key Points
  • Qwen3.6-35B model, designed for 262K tokens, was crippled to a 32K context window on a 32GB M2 Mac due to memory exhaustion.
  • The model could identify a complex full-stack bug that stumped the original developer, but failed implementation due to catastrophic context compaction.
  • Official model documentation states that maintaining at least 128K tokens of context is required to preserve the model's "thinking capabilities" for complex tasks.

Why It Matters

Highlights the hardware barrier for practical local AI coding, forcing developers to choose between cloud dependency and significant hardware investment.