Ollama v0.30.0 shifts to llama.cpp, adds GGUF support
Ollama’s new version directly supports llama.cpp and GGUF format with MLX acceleration.
Ollama, the popular local LLM runner with 173k GitHub stars, released v0.30.0 (pre-release). This version makes a fundamental architecture change: it now directly supports llama.cpp instead of building on top of GGML, enabling full GGUF file format compatibility. For Apple Silicon users, MLX acceleration is added to speed up model inference. The team is asking for feedback on performance, errors, crashes, and memory utilization.
Known limitations include laguna-xs.2 not supported on Windows/Linux, llama3.2-vision not yet supported, and nomic-embed-text now converting all inputs to lowercase (as per the model card, fixing a prior bug). Installation requires pinning version 0.30.0-rc31 via curl or PowerShell. This release aims to modernize Ollama's backend while maintaining its ease of use for running LLMs locally.
- Architecture switched from GGML to direct llama.cpp support for full GGUF compatibility
- MLX acceleration added for Apple Silicon, improving inference speed
- Known issues: laguna-xs.2 unsupported on Windows/Linux, llama3.2-vision unsupported, nomic-embed-text now lowercases inputs
Why It Matters
Better GGUF compatibility and Apple Silicon acceleration make local LLM deployment faster and more flexible for developers.