Developer Tools

v0.19.0

The latest update fixes a memory leak in MLX, improves KV cache hit rates, and adds a built-in web search plugin.

Deep Dive

Ollama, the open-source platform for running large language models locally, has launched version 0.19.0. This release introduces a significant new feature: a built-in web search plugin. Users can now access this functionality directly by running `ollama launch pi`, enabling models to retrieve and incorporate real-time information from the web, which enhances the capabilities of local AI agents without requiring complex external setups.

Beyond the new plugin, v0.19.0 is a substantial stability and performance update. It patches a critical memory leak in the KV cache snapshot mechanism for the MLX runner (used on Apple Silicon) and improves the KV cache hit rate when using the Anthropic-compatible API, which can lead to faster and more cost-effective inference. The update also resolves specific model issues, including incorrect flash attention enabling for Grok models and tool call parsing problems with Qwen3.5, ensuring broader compatibility and reliability across the supported model ecosystem.

Key Points
  • Adds a native web search plugin, accessible via `ollama launch pi`, for real-time information retrieval.
  • Fixes a memory leak in the MLX runner's KV cache and improves KV cache hit rates for better performance.
  • Resolves model-specific bugs for Grok and Qwen3.5, improving stability and tool-calling functionality.

Why It Matters

This update makes local AI agents more powerful with web access and significantly improves performance and stability for developers.