Running local LLMs on an M5 MacBook Pro costs roughly 3x more per million tokens than OpenRouter, and is 2-7x slower.
Key Takeaways
M5 Max MBP at 50-100W and $0.20/kWh costs ~$0.02/hr in electricity; hardware depreciation dominates total cost.
At 10-40 tokens/sec on Gemma 4 31B, amortized cost ranges from $0.40 to $4.79 per million tokens depending on lifespan and throughput assumptions.
OpenRouter serves Gemma 4 31B at $0.38-$0.50/million tokens at 60-70 tokens/sec, beating local on both price and speed in most scenarios.
Only the optimistic case (50W, 40 tok/s, 10-year lifespan) makes Apple Silicon cost-competitive with OpenRouter.
For a salaried developer, token costs are ~1000x cheaper than their salary, making hosted inference the pragmatic default.
Hacker News Comment Review
Commenters largely agree local is costlier for pure token throughput, but argue the analysis is flawed: the laptop’s non-inference value is excluded, and the hardware cost should be marginal if the device is already owned.
A key methodological hole: the post only counts output tokens. For agentic workloads where input tokens dominate, local inference changes the math significantly since input tokens are nearly free locally.
Several commenters pushed back on the frontier-API pricing as a baseline, noting OpenRouter open-model providers are not subsidizing inference the way OpenAI or Anthropic are, and that scale efficiencies explain the gap.
Notable Comments
@maho: Agentic workloads are input-token-heavy; local inference makes those near-free, a cost dimension the post ignores entirely.
@antirez: A 128GB M5 Max running DeepSeek V4 flash offline, with no censorship or privacy risk, reframes the value proposition beyond raw token cost.