Developer Tools

Ollama v0.30.0 shifts to llama.cpp, adds GGUF support

Ollama’s new version directly supports llama.cpp and GGUF format with MLX acceleration.

Deep Dive

Ollama, the popular local LLM runner with 173k GitHub stars, released v0.30.0 (pre-release). This version makes a fundamental architecture change: it now directly supports llama.cpp instead of building on top of GGML, enabling full GGUF file format compatibility. For Apple Silicon users, MLX acceleration is added to speed up model inference. The team is asking for feedback on performance, errors, crashes, and memory utilization.

Known limitations include laguna-xs.2 not supported on Windows/Linux, llama3.2-vision not yet supported, and nomic-embed-text now converting all inputs to lowercase (as per the model card, fixing a prior bug). Installation requires pinning version 0.30.0-rc31 via curl or PowerShell. This release aims to modernize Ollama's backend while maintaining its ease of use for running LLMs locally.

Key Points
  • Architecture switched from GGML to direct llama.cpp support for full GGUF compatibility
  • MLX acceleration added for Apple Silicon, improving inference speed
  • Known issues: laguna-xs.2 unsupported on Windows/Linux, llama3.2-vision unsupported, nomic-embed-text now lowercases inputs

Why It Matters

Better GGUF compatibility and Apple Silicon acceleration make local LLM deployment faster and more flexible for developers.