Developer Tools

Ollama v0.30.0 shifts to llama.cpp, adds GGUF support and MLX acceleration

Major architecture overhaul: now with native llama.cpp support and Apple Silicon boost.

Deep Dive

Ollama's latest pre-release (v0.30.0) marks a significant architectural shift: the local AI model runner now directly integrates llama.cpp instead of building on top of GGML. This change unlocks full compatibility with the GGUF file format, the de facto standard for quantized models, and introduces MLX acceleration for Apple Silicon Macs. The result is faster inference, better memory utilization, and broader model interoperability—critical for developers running large language models locally.

The v0.30.0 pre-release is available for testing on Mac, Linux, and Windows via the standard install script with the version flag set. Known limitations include missing support for certain models like laguna-xs.2 and llama3.2-vision. The team asks users to report any performance improvements or regressions, new errors, or changes in memory usage. This release sets the foundation for future updates, promising a more robust and efficient local AI experience.

Key Points
  • Architecture switched from GGML to direct llama.cpp support for improved performance and compatibility.
  • Added GGUF file format support, enabling broader model interoperability.
  • MLX acceleration now available for Apple Silicon, boosting inference speed on Macs.

Why It Matters

Streamlines local AI model deployment with better performance and broader model support.