Open Source

Unsloth updated (requantized) Qwen3-Coder-Next

The fine-tuning specialists released requantized models with 2-3x faster inference and improved accuracy.

Deep Dive

Unsloth, a company specializing in faster and more memory-efficient fine-tuning of open-source LLMs, has delivered on its promise by releasing updated quantized versions of Alibaba's Qwen3-Coder-Next model. The key announcement is the complete removal of MXFP4 (a 4-bit floating point format) layers from their quantizations. Instead, the team has 'requantized' the models using a new methodology centered on the Kullback–Leibler Divergence (KLD) metric, a statistical measure used to guide the quantization process to minimize the loss of information when reducing the model's numerical precision.

The technical shift from MXFP4 to KLD-guided quantization is significant for developers. Benchmarks shared by Unsloth show the updated models achieve 2-3x faster inference speeds while reportedly improving accuracy on code generation tasks. This means developers and researchers using these quantized models for local coding assistants can expect snappier responses and more reliable code suggestions without sacrificing model capability. The update underscores the rapid evolution of model optimization techniques beyond simple bit reduction, focusing on smarter methods to preserve performance, making powerful code models more accessible and efficient for local deployment.

Key Points
  • Unsloth removed all MXFP4 layers from their Qwen3-Coder-Next quantizations.
  • New quantization uses a KLD (Kullback–Leibler Divergence) metric to better preserve model performance.
  • Results show 2-3x faster inference speeds with maintained or improved code generation accuracy.

Why It Matters

Enables faster, more accurate local code AI assistants, making developer tools more efficient and accessible.