Gemma 4 26b a4b - MacBook Pro M5 MAX. Averaging around 81tok/sec
Google's open model runs blazing fast on Apple's unreleased chip, hinting at powerful on-device AI.
A viral benchmark has surfaced showing Google's Gemma 2 27B parameter model running at an average speed of 81 tokens per second on what appears to be a prototype MacBook Pro equipped with Apple's next-generation M5 Max chip. The user 'Bderken' reported the model uses approximately 114 watts at peak power during short inference bursts, indicating efficient performance for a model of its size. This leak provides the first concrete performance numbers for the unannounced M5 Max silicon, suggesting significant AI inference improvements over the current M3 generation.
The benchmark highlights a major trend toward powerful on-device AI. Running a 27-billion-parameter model like Gemma 2 at usable speeds on a laptop removes the latency, cost, and privacy concerns associated with cloud-based AI. For developers and professionals, this means the potential to build and run sophisticated AI agents, coding assistants, and creative tools entirely locally on a portable machine. The performance, if accurate, positions future MacBooks as serious contenders for AI development workstations.
While the source is an unofficial Reddit post and the hardware is not yet released, the numbers align with expected generational leaps in Apple's custom silicon. The efficiency—delivering high token throughput while managing power draw—is a key metric for practical, all-day use. This leak intensifies the competition in the edge AI hardware space, challenging other chipmakers and setting new expectations for what's possible in a consumer laptop form factor.
- Gemma 2 27B hits ~81 tokens/sec on a MacBook Pro with an unreleased M5 Max chip.
- Peak power draw is reported at ~114 watts, indicating efficient performance for a large model.
- The leak suggests powerful, local LLM execution is becoming viable on consumer laptops.
Why It Matters
Enables professional-grade, private AI development and deployment directly on portable hardware, reducing cloud reliance.