An Update on GitHub Availability

· databases ai-agents systems · Source ↗

TLDR

  • GitHub CTO outlines two recent incidents and a scaling overhaul after agentic workflow growth forced the capacity target from 10X to 30X.

Key Takeaways

  • April 23 merge queue bug: squash merges silently reverted commits from prior PRs; 230 repos and 2,092 pull requests affected, no data loss but default branches left in incorrect state.
  • April 27 Elasticsearch overload, likely a botnet attack, knocked out search-backed UI across pull requests, issues, and projects; Git operations and APIs stayed up.
  • Agentic workflows accelerated sharply from late December 2025, pushing capacity needs from 10X (the October 2025 plan) to 30X; a single PR can hit Git storage, Actions, search, webhooks, permissions, caches, and databases simultaneously.
  • Short-term fixes include moving webhooks out of MySQL, redesigning user session cache, refactoring auth and authz to cut database load, and spinning up more compute on Azure.
  • Longer-term work: isolating Git and Actions from other workloads, migrating performance-sensitive Ruby monolith paths to Go, and pursuing a multi-cloud path beyond Azure.

Hacker News Comment Review

  • Commenters read the multi-cloud announcement as a quiet admission that Azure alone cannot deliver the reliability GitHub needs, which undercuts the narrative that the Azure migration was the reliability fix.
  • Skepticism about stated priorities is high: “availability first” was also the declared priority six months ago when the Azure-first pivot was announced, yet the outage frequency has not visibly improved, and the post’s graphs lack labeled baselines.
  • A separate thread questions long-term free-tier economics: LLM-generated repo and commit volume is visible in the growth charts, and several commenters doubt GitHub can absorb that load indefinitely without changing access or pricing.

Notable Comments

  • @frangonf: argues the token subsidy era is ending now that training data extraction and agentic lock-in are sufficient, linking to a related thread as evidence.
  • @cedws: “Can’t see them sustaining being a public dustbin for low value projects forever” – flags LLM-wave spam as a concrete free-tier sustainability risk visible in the same growth graphs GitHub is citing.

Original | Discuss on HN