Developer Tools

Ollama v0.30.0 switches to llama.cpp, adds GGUF and MLX support

Major under‑the‑hood rewrite brings faster inference and Apple Silicon boost.

Deep Dive

Ollama released v0.30.0‑rc23, a pre‑release that directly supports llama.cpp instead of building on GGML, and now works with the GGUF file format. MLX accelerates model inference on Apple Silicon. The team is asking for feedback on performance improvements or degradation, errors or crashes, and memory utilization changes. Known limitations: laguna-xs.2 and llama3.2‑vision are not yet supported.

Key Points
  • Ollama v0.30.0 replaces GGML with direct llama.cpp integration for improved performance.
  • New GGUF file format support allows compatibility with the largest model ecosystem.
  • MLX acceleration on Apple Silicon delivers faster inference on Macs with M‑series chips.

Why It Matters

Local AI gets a major speed and compatibility boost, making self‑hosted models more practical for professionals.