Ollama v0.30.0 shifts to native llama.cpp & GGUF support
Ollama’s biggest refactor yet: direct llama.cpp compatibility and MLX acceleration for Apple Silicon.
Ollama v0.30.0 pre-release marks a major architectural overhaul: the popular local LLM runner now directly integrates with llama.cpp rather than relying on the older GGML stack. This change brings native support for the widely-used GGUF file format, simplifying model loading and expanding compatibility with community models. Additionally, MLX is now used to accelerate inference on Apple Silicon Macs, promising better performance for on-device AI. The team is actively seeking feedback on performance improvements or regressions, new errors or crashes, and memory utilization changes.
Known issues include lack of support for laguna-xs.2 and llama3.2-vision in this pre-release. Installation instructions are provided for Mac/Linux (curl) and Windows (PowerShell). With 171k GitHub stars and an active community, this update signals Ollama’s commitment to staying current with the fast-evolving open-source LLM ecosystem. Users upgrading from earlier versions should expect a different underlying engine, which may require retesting workflows.
- Architecture shift from GGML to direct llama.cpp integration
- Native GGUF file format compatibility for broader model support
- MLX acceleration for Apple Silicon (M-series) inference
- Pre-release status with known limitations: laguna-xs.2 and llama3.2-vision unsupported
Why It Matters
Simplifies model compatibility and improves performance on Macs, critical for local AI tooling adoption.