Developer Tools

Apple Silicon local inference costs 3x more than OpenRouter, analysis finds

Running Gemma 4 locally on M5 MacBook Pro costs ~$1.50 per million tokens...

Deep Dive

A detailed cost breakdown from the Offline Agentic Coding blog reveals that running large language models locally on Apple Silicon is significantly more expensive and slower than using cloud APIs like OpenRouter. The analysis focuses on the M5 Max MacBook Pro with 64GB RAM ($4,299) running Gemma 4 31b—a model close to Anthropic Sonnet in performance. At 50–100 watts power draw and $0.20/kWh electricity, the per-hour electricity cost is roughly $0.01–0.02. However, the dominant cost is hardware depreciation: over a 5-year lifespan, the hourly hardware cost is ~$0.10, pushing total cost per million tokens to $1.50 at 10 tokens/second or $0.40 at 40 tokens/second.

In contrast, OpenRouter offers Gemma 4 31b at $0.38–0.50 per million tokens with speeds of 60–70 tokens/second—2–7x faster than local inference. The author concludes that for most use cases, especially when a human employee's salary is factored in (roughly 1,000x the token cost), using cloud services like Anthropic or OpenRouter is far more sensible. While it's impressive that consumer hardware can run near-Sonnet-level models, the economics and speed still heavily favor cloud inference.

Key Points
  • M5 Max MacBook Pro (64GB, $4,299) runs Gemma 4 31b at 10–40 tokens/second, costing ~$1.50 per million tokens amortized over 5 years.
  • OpenRouter charges $0.38–0.50 per million tokens for the same model, with speeds up to 70 tokens/second—making it 3x cheaper and 2–7x faster.
  • Hardware depreciation dominates costs: electricity is only ~$0.02/hour, but hardware adds $0.10/hour over 5 years, or more over shorter lifespans.

Why It Matters

Cloud inference remains drastically more cost-effective and faster than local hardware, even for high-end laptops.