Developer Tools

v0.21.1

The update integrates Moonshot's powerful Kimi K2.6 model and delivers 2x faster MLX sampling.

Deep Dive

Ollama, the popular open-source platform for running large language models locally, has released version 0.21.1. The headline feature is the official command-line interface (CLI) integration for Moonshot AI's Kimi K2.6 model. Users can now install and run the model directly via `ollama launch kimi --model kimi-k2.6:cloud`. Kimi K2.6 is specifically designed for complex, long-horizon agentic execution tasks, leveraging a multi-agent system architecture. This makes it a powerful tool for developers building automated workflows that require sequential planning and execution.

Beyond the new model support, the update delivers significant performance optimizations for Apple's MLX framework. A major change fuses the top-P and top-K sampling operations into a single sort pass, which the developers report makes sampling up to twice as fast. The release also improves MLX prompt tokenization by moving it into request handler goroutines and enhances thread safety for array management. Additional fixes address structured outputs for the Gemma 4 model and resolve a UI bug in the macOS app where the model picker would show stale data after switching chats.

Key Points
  • Adds CLI support for Moonshot AI's Kimi K2.6 model, designed for long-horizon agentic tasks.
  • Implements fused top-P and top-K sampling in MLX, resulting in up to 2x faster inference speeds.
  • Includes performance improvements for GLM4 MoE Lite and fixes for Gemma 4 structured outputs.

Why It Matters

Developers gain a high-performance, local option for running advanced agentic AI models, streamlining complex automation and workflow development.