Ollama v0.30.0 rewrites stack for llama.cpp, GGUF, and Apple MLX
171K-star project ditches GGML for faster inference and broader model support.
Deep Dive
Ollama released v0.30.0-rc15, a pre-release that changes the architecture to directly support llama.cpp instead of GGML, adds GGUF file format compatibility, and uses MLX for accelerating model inference on Apple Silicon. The team requests feedback on performance, errors, crashes, and memory utilization changes. Known issues: laguna-xs.2 and llama3.2-vision are not yet supported. Install on Mac/Linux via curl with OLLAMA_VERSION=0.30.0-rc15, or on Windows via PowerShell with the same version flag.
Key Points
- Switches from GGML to llama.cpp backend for better performance and GGUF compatibility
- Adds MLX acceleration for Apple Silicon Macs, improving inference speed
- Pre-release with two unsupported models: laguna-xs.2 and llama3.2-vision
Why It Matters
Ollama’s architecture overhaul unlocks faster local AI on Macs and supports the latest open models via GGUF.