Open Source

qwen3.5-35b-a3b is a gem

A developer's 12-second test shows this 35B parameter model generates better docs than a model 3.5x its size.

Deep Dive

A developer's viral Reddit post has spotlighted the Qwen 3.5 35B A3B model from Alibaba's Qwen team as a surprisingly efficient powerhouse for automated code documentation. In a hands-on test, the user ran the 6-bit quantized version of the model locally using LM Studio on an Apple M4 Max laptop with 128GB of RAM. The setup achieved an impressive inference speed of 80-90 tokens per second. For a specific task—generating and updating Python docstrings—the model processed and rewrote a file in a mere 12 seconds. The most striking claim is that the output was subjectively judged to be of slightly higher quality than that from a much larger 122-billion-parameter model, suggesting exceptional efficiency for its size.

The practical workflow involved a custom automation tool named 'llmaid,' which the developer hosts on GitHub. This tool uses configuration profiles (like `code-documenter.yaml`) to define tasks. The command sent batches of files from a local repository to the locally-hosted LLM via an API endpoint, instructing it to rewrite the contents, and then automatically replaced the files. This demonstrates a move towards highly personalized, local AI coding assistants that can rapidly improve codebase maintainability without relying on cloud APIs, offering developers speed, privacy, and cost control.

Key Points
  • The 35B parameter Qwen model ran locally at 80-90 tokens/sec on an M4 Max, rewriting a file's docs in 12 seconds.
  • User subjectively rated its docstring output as better than a 122B parameter model, highlighting its quality-for-size efficiency.
  • Workflow used a custom 'llmaid' tool with YAML profiles to batch-process an entire code repository automatically.

Why It Matters

It proves smaller, locally-runnable models can match larger cloud models for specific tasks, enabling faster, private, and cheaper AI-assisted development.