v0.18.0
The popular local AI runner now connects directly to cloud models without downloads, simplifying workflows.
Ollama, the open-source platform for running large language models (LLMs) locally, has launched version 0.18.0. This update significantly streamlines developer workflows by introducing direct, on-demand access to Ollama's cloud-hosted models. Previously, users had to manually download cloud models using the `ollama pull` command. Now, simply tagging a model with `:cloud` automatically connects to the cloud version, eliminating the download step and saving storage space. This feature bridges the gap between local experimentation and scalable cloud deployment.
The release also enhances the tool's functionality for Claude Code, Anthropic's coding-focused model, by adding support for configuring its 'compaction window'—a parameter affecting how the model processes and retains context within long code blocks. Additionally, the update improves the internal ordering of models when multiple instances are run via `ollama run`, leading to more predictable performance. These refinements, contributed in part by new contributor @flipbit03, solidify Ollama's position as a versatile tool for developers who need to test and run various AI models, from local Llama 3 instances to cloud-based GPT-4o or Claude 3.5, within a unified interface.
- Direct cloud model access: Using the `:cloud` tag connects to models without a manual download, saving time and disk space.
- Enhanced Claude Code support: New ability to set the model's 'compaction window' parameter for better control over code context processing.
- Improved model ordering: Better logic for sequencing models when running multiple instances with `ollama run`, boosting reliability.
Why It Matters
It simplifies the hybrid local/cloud AI development workflow, letting professionals prototype locally and scale to the cloud seamlessly.