Difference Between QWEN 3 Max-Thinking and QWEN 3.5 on a Spatial Reasoning Benchmark (MineBench)
A new benchmark shows a massive leap for Alibaba's open-source model.
Deep Dive
A new spatial reasoning benchmark, MineBench, shows Qwen 3.5 making an 'insane improvement' over its predecessor. The creator reports some Qwen 3.5 builds performed closer to, if not better than, top-tier closed models like Claude Opus 4.6, GPT-5.2 Pro, and Gemini 3 Pro. This suggests a dramatic narrowing of the performance gap between leading open and closed-source AI models in specific reasoning tasks.
Why It Matters
Open-source models may be catching up to the most advanced AI, potentially reshaping the competitive landscape.