Open Source

M5 Max just arrived - benchmarks incoming

Independent tests show Apple's new M5 Max chip running massive 122B parameter AI models directly on MacBooks with impressive speed.

Deep Dive

Independent AI researcher 'cryingneko' has published the first comprehensive benchmarks of Apple's new M5 Max chip running large language models locally. Using Apple's MLX framework with a fresh Python virtual environment, the tests focused on raw performance metrics for models like the massive 122-billion parameter Qwen3.5-122B-A10B-4bit. The results show the 128GB M5 Max MacBook Pro achieving 54-79 tokens/second generation speeds across different model sizes, with prompt processing reaching up to 1,887 tokens/second.

The benchmarks reveal the M5 Max's ability to handle context windows up to 32,768 tokens while maintaining consistent performance. Peak memory usage stayed under 90GB even for the largest models tested, demonstrating efficient memory management. The tests specifically used mlx_lm's stream_generate function after initial BatchGenerator tests showed suboptimal speeds, highlighting the importance of proper framework configuration for maximum performance.

These results position Apple Silicon as a serious contender for local AI development, potentially disrupting the traditional cloud-based AI workflow. The ability to run 122B parameter models locally at usable speeds could accelerate AI research and development workflows while reducing cloud costs for individual developers and small teams.

Key Points
  • M5 Max runs 122B parameter Qwen models at 60-65 tokens/sec generation speeds
  • Prompt processing hits 1,887 tokens/sec using Apple's MLX framework with optimized configuration
  • Peak memory usage stays under 90GB even with 32K context windows on 128GB models

Why It Matters

Enables developers to run enterprise-scale AI models locally, reducing cloud costs and latency while increasing privacy and control.