Apple Silicon costs more than OpenRouter

· ai · Source ↗

TLDR

  • Running local LLMs on an M5 MacBook Pro costs roughly 3x more per million tokens than OpenRouter, and is 2-7x slower.

Key Takeaways

  • M5 Max MBP at 50-100W and $0.20/kWh costs ~$0.02/hr in electricity; hardware depreciation dominates total cost.
  • At 10-40 tokens/sec on Gemma 4 31B, amortized cost ranges from $0.40 to $4.79 per million tokens depending on lifespan and throughput assumptions.
  • OpenRouter serves Gemma 4 31B at $0.38-$0.50/million tokens at 60-70 tokens/sec, beating local on both price and speed in most scenarios.
  • Only the optimistic case (50W, 40 tok/s, 10-year lifespan) makes Apple Silicon cost-competitive with OpenRouter.
  • For a salaried developer, token costs are ~1000x cheaper than their salary, making hosted inference the pragmatic default.

Hacker News Comment Review

  • Commenters largely agree local is costlier for pure token throughput, but argue the analysis is flawed: the laptop’s non-inference value is excluded, and the hardware cost should be marginal if the device is already owned.
  • A key methodological hole: the post only counts output tokens. For agentic workloads where input tokens dominate, local inference changes the math significantly since input tokens are nearly free locally.
  • Several commenters pushed back on the frontier-API pricing as a baseline, noting OpenRouter open-model providers are not subsidizing inference the way OpenAI or Anthropic are, and that scale efficiencies explain the gap.

Notable Comments

  • @maho: Agentic workloads are input-token-heavy; local inference makes those near-free, a cost dimension the post ignores entirely.
  • @antirez: A 128GB M5 Max running DeepSeek V4 flash offline, with no censorship or privacy risk, reframes the value proposition beyond raw token cost.

Original | Discuss on HN