Developer Tools

Ollama v0.30.0 shifts to llama.cpp, unlocks GGUF and MLX

New architecture drops GGML, speeds up Apple Silicon models with MLX.

Deep Dive

Ollama's v0.30.0 pre-release switches its backend to directly support llama.cpp instead of building on top of GGML, adding compatibility with the GGUF file format. MLX now accelerates model inference on Apple Silicon. The team requests feedback on performance improvements or degradation, new errors or crashes, and memory utilization changes. Known unsupported models: laguna-xs.2 and llama3.2-vision.

Key Points
  • Switches from GGML to llama.cpp backend for direct GGUF file support.
  • Adds MLX acceleration for Apple Silicon, improving inference speed.
  • Pre‑release: no llama3.2-vision or laguna-xs.2 support yet; feedback requested.

Why It Matters

Faster, more compatible local LLM deployment—key for developers running models on consumer hardware.