Ollama v0.30.0 switches to llama.cpp, adds GGUF support
New architecture brings direct llama.cpp support and MLX acceleration for Apple Silicon.
Ollama's v0.30.0 pre-release marks a significant architectural shift, moving from GGML to a direct integration with llama.cpp. This change adds full support for the GGUF file format, the modern successor to GGML, which should improve model compatibility and loading speeds. For macOS users, the update introduces MLX acceleration, Apple's machine learning framework, promising faster inference on Apple Silicon. Installation is available via curl or PowerShell with version pinning, and the team is actively seeking feedback on performance, memory usage, and any new errors.
Known issues include missing support for laguna-xs.2 on Windows/Linux, llama3.2-vision models, and a behavioral change where nomic-embed-text now lowercases input text to match the model card—Ollama previously preserved mixed case. As a pre-release, users should anticipate potential instability and report any regressions. This overhaul positions Ollama for better long-term alignment with the llama.cpp ecosystem, which is critical for running cutting-edge open-weight models locally.
- Architecture switches from GGML to direct llama.cpp integration for better performance and compatibility.
- Adds GGUF file format support, the current standard for llama.cpp models.
- MLX acceleration enables faster inference on Apple Silicon, but laguna-xs.2 and llama3.2-vision are not yet supported on Windows/Linux.
Why It Matters
Major architecture overhaul improves compatibility and performance for local LLM deployments on Apple Silicon and beyond.