Scores 90.1 on MMLU, matching GPT-4-Turbo's performance on knowledge and reasoning?

Scores 90.1 on MMLU, matching GPT-4-Turbo's performance on knowledge and reasoning.

Features 72B parameters and a 128K context window, excelling in coding (80.2 HumanEval) and math (78.7 MATH)?

Features 72B parameters and a 128K context window, excelling in coding (80.2 HumanEval) and math (78.7 MATH).

Offered via a new API with aggressive pricing, providing enterprise-grade AI at ~1% the cost of GPT-4?

Offered via a new API with aggressive pricing, providing enterprise-grade AI at ~1% the cost of GPT-4.

Open Source

Alibaba's Qwen3.6-Plus model matches GPT-4-Turbo on key benchmarks

r/LocalLLaMA April 02, 2026

⚡The 72B parameter open-source model scores 90.1 on MMLU and costs 99% less than GPT-4.

Deep Dive

Alibaba's Qwen team has officially released Qwen3.6-Plus, a major upgrade in its open-source large language model series. The 72-billion-parameter model delivers performance that rivals top-tier proprietary models, scoring 90.1 on the MMLU benchmark—a key measure of knowledge and reasoning—which puts it on par with OpenAI's GPT-4-Turbo. Beyond general knowledge, it shows significant strength in specialized areas, achieving a 78.7 score on the MATH benchmark and an 80.2 on HumanEval for coding tasks. The model supports a 128K token context window and is available both through a new API platform and for download, offering a compelling cost-to-performance ratio for developers and enterprises.

This release is part of a broader Qwen3.6 family rollout, which includes smaller variants like the 14B and 1.5B models. The team emphasizes that Qwen3.6-Plus is designed for complex, multi-step reasoning and can function as a capable AI agent. Its API pricing is positioned aggressively, making high-level AI capabilities accessible at a dramatically lower cost than market leaders. The launch signals intensified competition in the open-source AI arena, providing a powerful, commercially viable alternative that could accelerate adoption and innovation in enterprise applications, from coding assistants to advanced analytical tools.

Key Points

Scores 90.1 on MMLU, matching GPT-4-Turbo's performance on knowledge and reasoning.
Features 72B parameters and a 128K context window, excelling in coding (80.2 HumanEval) and math (78.7 MATH).
Offered via a new API with aggressive pricing, providing enterprise-grade AI at ~1% the cost of GPT-4.

Why It Matters

Delivers GPT-4-level performance to the open-source community, drastically reducing costs and barriers for enterprise AI deployment.

Read Original Article

Alibaba's Qwen3.6-Plus model matches GPT-4-Turbo on key benchmarks

Why It Matters

Related Articles

🚀 Stay Ahead in AI