Developer Tools

v0.17.5

The update brings four new Qwen 3.5 models from 0.8B to 9B parameters with crucial stability fixes.

Deep Dive

Ollama, the open-source platform for running large language models locally, has released version 0.17.5, bringing significant improvements to model support and system stability. The headline feature is the official integration of Alibaba's Qwen 3.5 small model series, now available in four parameter sizes: 0.8B, 2B, 4B, and 9B. This release addresses several critical bugs that were causing crashes and performance issues, particularly for users running models across mixed hardware configurations. The update follows the platform's rapid growth, evidenced by its 164k GitHub stars, and continues to expand the ecosystem of locally-runnable AI models.

The technical improvements in v0.17.5 are substantial. The update fixes a crash that occurred when Qwen 3.5 models were split between GPU and CPU memory, a common configuration for users with limited VRAM. It also resolves a repetition issue in Qwen models by implementing a presence penalty, though this requires users to redownload affected models. For Apple Silicon users, the MLX engine received fixes for memory issues and crashes. Additionally, the new `--verbose` flag now displays peak memory usage, providing better debugging tools for developers. These changes make Ollama more reliable for production use while expanding the available model options for edge deployment.

Key Points
  • Adds four new Qwen 3.5 models ranging from 0.8B to 9B parameters for lightweight local AI
  • Fixes critical crash when models split between GPU/CPU and resolves repetition issues requiring model redownload
  • Improves MLX engine stability on Apple Silicon and adds peak memory usage reporting with --verbose flag

Why It Matters

Expands accessible local AI options with smaller models and makes Ollama more stable for production deployment across different hardware.