Qwen3.5 27B vs Devstral Small 2 - Next.js & Solidity (Hardhat)
Informal benchmark shows Qwen3.5 27B matches Mistral's Devstral Small 2 in agentic coding, with 35B variant refactoring 900+ LoC files.
A developer's viral informal benchmark test has sparked discussion by comparing Alibaba's open-source Qwen3.5 models (27B and 35B parameters) against Mistral's popular Devstral Small 2 for real-world coding tasks. The test focused on agentic challenges within a production repository using Next.js and Solidity with Hardhat. The reviewer noted that while both the Qwen3.5 27B and Devstral Small 2 performed nearly identically in execution and token efficiency for repo work, the Qwen model produced more extensive documentation, especially at a Q6 quantization level. The larger Qwen3.5 35B model demonstrated significant refactoring capability but was noted to potentially over-engineer solutions without proper guidance.
The technical setup was rigorous, using a custom-built llama.cpp with CUDA support on an RTX 5090 GPU and a Ryzen 9 9950X CPU. The benchmark creator ran 78 specific agentic challenges to settle the comparison, finding the choice difficult. The results align with third-party analysis from Artificial Analysis, which places the Qwen3.5 27B's performance suspiciously close to models like Claude 3.5 Sonnet and DeepSeek V3.2. This test highlights the rapid closing of the performance gap between leading open-source coding models, giving developers more viable, cost-effective alternatives for complex software engineering tasks like planning, coding, and refactoring.
- Qwen3.5 27B matched Mistral's Devstral Small 2 in execution on 78 agentic coding challenges for Next.js and Solidity.
- The Qwen3.5 35B model refactored a single 900+ line code file into 35 separate parts, showing advanced handling capability.
- Third-party analysis (Artificial Analysis) ranks Qwen3.5 27B close to top-tier models like Claude 3.5 Sonnet and DeepSeek V3.2.
Why It Matters
Open-source models are now competitive for complex software engineering, giving developers powerful, cost-effective alternatives for coding and refactoring.