Ollama v0.23.3 improves mlx stability and Apple Silicon inference
New Ollama release fixes timeout issues and refines model push for Apple MLX...
Ollama, the popular open-source tool for running large language models locally, has released v0.23.3 with a focus on improving the mlx (Apple's machine learning framework) backend. The update refines model push behavior, ensuring smoother transfers when syncing models. It also updates the imagegen runner for mlx to support thread affinity, which can improve performance by better binding threads to CPU cores. Additionally, the release avoids a status timeout during inference—a fix that prevents stalled model responses on Apple Silicon hardware. A critical patch addresses macOS 26 target leakage in v3 metallib, ensuring compatibility with newer macOS versions.
The release also includes hardening of integration tests and update flows, making the update process more reliable. Contributors pdevine (likely the Ollama CTO) and dhiltgen (a regular contributor) led these changes. While not a major feature release, v0.23.3 delivers tangible stability improvements for users deploying models locally on Macs, especially those leveraging Apple's MLX for performance. For professionals running Ollama in production or research environments, this update reduces the risk of timeouts and compatibility issues, making local AI inference more dependable.
- Refined model push behavior for mlx backend to improve reliability when syncing local models
- Added thread affinity for imagegen runner on Apple Silicon to optimize CPU core utilization
- Fixed status timeout during inference and resolved macOS 26 target leakage in v3 metallib
Why It Matters
Stability fixes for Ollama on Apple Silicon ensure smoother local AI inference for professionals.