Developer Tools

Ollama v0.30.0 pre-release switches to llama.cpp and GGUF support

Ollama's latest pre-release brings native llama.cpp and GGUF support for faster inference.

Deep Dive

Ollama, the popular local LLM runtime, has released v0.30.0-rc28, a major pre-release that fundamentally changes its architecture. Instead of building on top of GGML, Ollama now directly integrates with llama.cpp, the C++ library that powers many open‑source models. This shift also brings native support for the GGUF file format, which is the standard for quantized models in the llama.cpp ecosystem. For Apple Silicon users, Ollama now leverages MLX (Apple's machine learning framework) to accelerate model inference, promising better performance on Macs.

While this pre-release offers significant improvements, it is not without caveats. The team reports known issues: laguna-xs.2 and llama3.2-vision are not yet supported. Developers are encouraged to test the release on their own hardware and report any performance regressions, new crashes, or memory utilization changes. Installation via curl (Mac/Linux) or PowerShell (Windows) is straightforward using the OLLAMA_VERSION environment variable. The community has reacted strongly with over 155 reactions, indicating high interest in this architectural overhaul.

Key Points
  • Directly supports llama.cpp instead of GGML, enabling broader model compatibility via GGUF
  • MLX acceleration added for Apple Silicon, improving inference speed on Macs
  • Known issues: laguna-xs.2 and llama3.2-vision not supported in this pre-release

Why It Matters

Moving to native llama.cpp support unlocks more models and performance gains for local AI inference enthusiasts.