Ollama v0.30.0 shifts to llama.cpp, unlocks GGUF and MLX
New architecture drops GGML, speeds up Apple Silicon models with MLX.
Get AI news that actually matters
One email a day. Zero fluff. Join 10,000+ professionals.
Deep Dive
Ollama's v0.30.0 pre-release switches its backend to directly support llama.cpp instead of building on top of GGML, adding compatibility with the GGUF file format. MLX now accelerates model inference on Apple Silicon. The team requests feedback on performance improvements or degradation, new errors or crashes, and memory utilization changes. Known unsupported models: laguna-xs.2 and llama3.2-vision.
Key Points
- Switches from GGML to llama.cpp backend for direct GGUF file support.
- Adds MLX acceleration for Apple Silicon, improving inference speed.
- Pre‑release: no llama3.2-vision or laguna-xs.2 support yet; feedback requested.
Why It Matters
Faster, more compatible local LLM deployment—key for developers running models on consumer hardware.