GitHub Copilot’s shift to usage-based pricing June 1, 2026 exposes a structural flaw: monthly AI subscriptions have always required subsidizing wildly variable token costs.
Key Takeaways
Microsoft was losing $20-$80/user/month on Copilot in 2023; the flat-fee model was unsustainable for three years before this announcement.
A single Copilot “premium request” consumed ~60,000 context tokens plus tool calls, costing ~$11 – well above per-request plan value.
Reasoning models increased inference costs even as per-token prices for older models fell, breaking the subsidy math further over time.
Anthropic reportedly allowed users to burn ~$8 in compute per $1 of subscription; OpenAI flat plans have similar structural losses.
The author argues monthly subscriptions are fundamentally incompatible with LLMs: one user makes 5 requests, another refactors an entire codebase – both pay the same.
Hacker News Comment Review
The article’s core premise is heavily contested: commenters cite frontier lab margins reportedly above 80% on tokens, and commodity providers like Kimi K2.6 serving profitably at $4/1M output tokens – suggesting the losses are a deliberate product-layer subsidy decision, not a structural impossibility.
Commenters disagree on cost trajectory: the article treats inference cost as permanently high or rising, but several argue 2-5x annual efficiency gains (hardware, quantization, caching, architecture) make flat-fee economics viable once the subsidy phase ends – unlike Uber where fuel costs are physically bounded.
There is broad agreement on the UX risk of metered billing: users trained on flat-fee have no intuition for token burn, agentic sessions can rack up costs invisibly, and this pattern has caused user pain from slashdot-era hosting bills to overnight CI retry loops.
Notable Comments
@joshjob42: Argues frontier labs run ~80% profit margins; Kimi K2.6 profitable at $4/1Mtok suggests flat-fee losses are policy, not physics.
@wood_spirit: Metered billing for compute has always caused surprise cost overruns; agentic AI sessions with unpredictable token counts are the newest version of this.