Refined model push behavior for mlx backend to improve reliability when syncing local models?

Refined model push behavior for mlx backend to improve reliability when syncing local models

Added thread affinity for imagegen runner on Apple Silicon to optimize CPU core utilization?

Added thread affinity for imagegen runner on Apple Silicon to optimize CPU core utilization

Fixed status timeout during inference and resolved macOS 26 target leakage in v3 metallib?

Fixed status timeout during inference and resolved macOS 26 target leakage in v3 metallib

Developer Tools

Ollama v0.23.3 improves mlx stability and Apple Silicon inference

Ollama Releases May 13, 2026

⚡New Ollama release fixes timeout issues and refines model push for Apple MLX...

Deep Dive

Ollama, the popular open-source tool for running large language models locally, has released v0.23.3 with a focus on improving the mlx (Apple's machine learning framework) backend. The update refines model push behavior, ensuring smoother transfers when syncing models. It also updates the imagegen runner for mlx to support thread affinity, which can improve performance by better binding threads to CPU cores. Additionally, the release avoids a status timeout during inference—a fix that prevents stalled model responses on Apple Silicon hardware. A critical patch addresses macOS 26 target leakage in v3 metallib, ensuring compatibility with newer macOS versions.

The release also includes hardening of integration tests and update flows, making the update process more reliable. Contributors pdevine (likely the Ollama CTO) and dhiltgen (a regular contributor) led these changes. While not a major feature release, v0.23.3 delivers tangible stability improvements for users deploying models locally on Macs, especially those leveraging Apple's MLX for performance. For professionals running Ollama in production or research environments, this update reduces the risk of timeouts and compatibility issues, making local AI inference more dependable.

Key Points

Refined model push behavior for mlx backend to improve reliability when syncing local models
Added thread affinity for imagegen runner on Apple Silicon to optimize CPU core utilization
Fixed status timeout during inference and resolved macOS 26 target leakage in v3 metallib

Why It Matters

Stability fixes for Ollama on Apple Silicon ensure smoother local AI inference for professionals.

Read Original Article

Ollama v0.23.3 improves mlx stability and Apple Silicon inference

Why It Matters

Related Articles

🚀 Stay Ahead in AI