Ollama v0.30.0 shifts to direct llama.cpp support, adds GGUF and MLX
GGML is out, GGUF and MLX are in — massive architecture change for local AI.
Ollama's v0.30.0 pre-release marks a fundamental architecture change: the project now directly supports llama.cpp instead of building on GGML. This shift brings full compatibility with the GGUF file format, which is the current standard for quantized models from llama.cpp. Additionally, Ollama now leverages Apple's MLX framework to accelerate model inference on Apple Silicon (M1, M2, M3), promising faster local runs for Mac users.
Installation is straightforward: Mac/Linux users run the modified curl script with OLLAMA_VERSION=0.30.0-rc20; Windows users set the environment variable and invoke the PowerShell installer. However, two known limitations exist: the laguna-xs.2 model and the llama3.2-vision model are not yet supported in this pre-release. The team is actively seeking community feedback on performance changes, new errors or crashes, and memory utilization improvements or regressions. This release is a significant step forward for local AI deployment.
- Architecture change from GGML to direct llama.cpp support, aligning with GGUF format
- MLX integration for faster inference on Apple Silicon (M-series chips)
- Known limitation: laguna-xs.2 and llama3.2-vision models not yet supported
Why It Matters
Simplifies model compatibility and boosts performance on Mac, but minor model gaps remain.