Incident with Multiple GitHub Services

· systems devtools open-source · Source ↗

TLDR

  • GitHub suffered simultaneous degraded availability across Webhooks, Actions, and Copilot on Apr 23, 2026, with root cause identified by 16:52 UTC.

Key Takeaways

  • Three core services affected: Webhooks, Copilot, and Actions – spanning CI/CD pipelines, AI tooling, and event-driven integrations simultaneously.
  • Incident timeline spanned ~40 minutes from first report (16:12 UTC) to root cause identification (16:52 UTC), with mitigation still in progress at that point.
  • Actions degradation was partial, not a full outage – some jobs completed, some failed, with no clear pattern distinguishing success from failure.
  • GitHub’s status page tracked the incident in real time across four update posts, consistent with their standard incident communication protocol.

Hacker News Comment Review

  • Commenters flagged that partial Actions failures are worse operationally than a full outage: jobs consuming 10+ minutes of runner time before failing waste quota and delay feedback loops.
  • There is growing sentiment that GitHub outages are frequent enough that the platform’s reliability posture is becoming a business risk, not just an inconvenience – a few commenters noted they have already migrated to GitLab for self-hosted CI runners.
  • Discussion touched on SLA math: GitHub would need roughly 16 additional hours of downtime in the 90-day rolling window to breach two 9s, which commenters treated as a meaningful threshold worth watching.

Notable Comments

  • @AnkerSkallebank: partial failures are harder to handle than clean failures – “kind of wish they would just fail outright, instead of running for 10 minutes and then failing.”
  • @argee: already migrated to GitLab; cites free self-hosted CI runners as a concrete operational advantage over GitHub.

Original | Discuss on HN