Developer Tools

Ollama v0.30 boosts performance with llama.cpp and adds GGUF support

Run GGUF models from Hugging Face with up to 2x faster NVIDIA speeds.

Deep Dive

Ollama v0.30.0 is now available, bringing significant performance and compatibility upgrades via llama.cpp integration. The update enhances the existing MLX engine on Apple Silicon while delivering faster inference speeds on NVIDIA GPUs. Users can now run GGUF-based models directly from Hugging Face, along with their own fine-tuned models, expanding the local AI toolbox. The release also includes a fix for nomic-embed-text, which now properly lowercases inputs per the model's specification.

However, there are known limitations: the laguna-xs.2 model is not yet supported on Windows or Linux, and llama3.2-vision support is missing entirely. These caveats aside, Ollama 0.30 marks a meaningful step forward for local model deployment, especially for developers looking to experiment with the latest open-weight models on commodity hardware.

Key Points
  • llama.cpp integration improves performance across Apple Silicon and NVIDIA hardware
  • Now supports GGUF-based models from Hugging Face and custom fine-tuned models
  • Fixes nomic-embed-text to lower inputs per model card; laguna-xs.2 and llama3.2-vision not yet supported

Why It Matters

Ollama makes local AI faster and more versatile, opening up GGUF models to a wider range of hardware.