Apple Silicon costs more than OpenRouter

· ai · Source ↗

TLDR

  • Running local inference on an M5 Max MacBook Pro costs roughly 3x more per million tokens than OpenRouter, with speed also slower by 3-7x.

Key Takeaways

  • At 10-40 tokens/sec on Gemma4 31B, amortized hardware plus electricity lands at $0.40-$4.79 per million tokens depending on lifespan assumptions.
  • OpenRouter serves Gemma4 31B at $0.38-$0.50 per million tokens at 60-70 tokens/sec, beating local on both cost and speed in most scenarios.
  • Hardware depreciation dominates the cost equation; electricity at $0.20/kWh adds only ~$0.02/hr at 100W load.
  • Only under the most optimistic assumptions (50W, 40 tok/s, 10-year lifespan) does local inference match OpenRouter pricing.
  • For salaried developers, token cost is ~1000x less than labor cost, making hosted APIs the pragmatic default.

Hacker News Comment Review

  • The core methodology drew heavy criticism: the laptop cost should be split between its use as a workstation and as an inference machine, not allocated entirely to token generation.
  • Commenters noted the analysis only counts output tokens; for agentic workloads, input tokens dominate, and local inference treats them as near-free, which shifts the comparison meaningfully.
  • Cloud inference benefits from concurrency: a single GPU serves many users simultaneously, spreading fixed power costs across requests in ways no single local device can match, making the cost gap structural rather than incidental.

Notable Comments

  • @antirez: Argues a 128GB M5 Max as primary workstation that also runs DeepSeek V4 Flash offline, without censorship, on private data, is a different value proposition than pure token cost.
  • @maho: Points out the post omits input token costs entirely, which dominate typical agentic workloads and are effectively free locally.

Original | Discuss on HN