Developer Tools

Ollama v0.30.0 shifts to direct llama.cpp support, adds GGUF and MLX

GGML is out, GGUF and MLX are in — massive architecture change for local AI.

Deep Dive

Ollama's v0.30.0 pre-release marks a fundamental architecture change: the project now directly supports llama.cpp instead of building on GGML. This shift brings full compatibility with the GGUF file format, which is the current standard for quantized models from llama.cpp. Additionally, Ollama now leverages Apple's MLX framework to accelerate model inference on Apple Silicon (M1, M2, M3), promising faster local runs for Mac users.

Installation is straightforward: Mac/Linux users run the modified curl script with OLLAMA_VERSION=0.30.0-rc20; Windows users set the environment variable and invoke the PowerShell installer. However, two known limitations exist: the laguna-xs.2 model and the llama3.2-vision model are not yet supported in this pre-release. The team is actively seeking community feedback on performance changes, new errors or crashes, and memory utilization improvements or regressions. This release is a significant step forward for local AI deployment.

Key Points
  • Architecture change from GGML to direct llama.cpp support, aligning with GGUF format
  • MLX integration for faster inference on Apple Silicon (M-series chips)
  • Known limitation: laguna-xs.2 and llama3.2-vision models not yet supported

Why It Matters

Simplifies model compatibility and boosts performance on Mac, but minor model gaps remain.