Incident Report: May 19, 2026 – GCP Account Suspension

· cloud · Source ↗

TLDR

  • GCP incorrectly suspended Railway’s production account via automated action, cascading into an ~8-hour platform-wide outage across GCP, Railway Metal, and AWS workloads.

Key Takeaways

  • The suspension hit at 22:20 UTC May 19; full resolution came at 07:58 UTC May 20, affecting dashboard, API, builds, and all customer workloads.
  • Root architectural flaw: edge proxies relied on a GCP-hosted control plane to populate routing tables, so when route caches expired, Metal and AWS workloads also went unreachable despite being up.
  • Recovery was sequential and slow: persistent disks, compute instances, and networking each required separate restoration steps after account access was restored.
  • Secondary cascade: GitHub began rate-limiting Railway’s OAuth and webhook integrations as retried requests spiked, blocking logins and builds during recovery.
  • Remediation plan includes decoupling the network control plane from GCP (true mesh), extending HA database shards across AWS and Metal, and removing GCP from the data plane’s hot path.

Hacker News Comment Review

  • Commenters broadly agree the architectural single point of failure was self-inflicted: the GCP-hosted control plane dependency negated Railway’s multi-cloud topology regardless of who caused the trigger.
  • The incident report never explains why GCP’s automated system flagged the account; commenters flagged this as the unaddressed root cause, noting Railway described symptoms but not the triggering condition.
  • Broader concern: GCP automated suspensions affecting paying B2B customers with no prior notice or recourse is a recurring pattern, not a Railway-specific edge case, and smaller operators have zero leverage to recover quickly.

Notable Comments

  • @Animats: “Google can no longer be trusted as a B2B service provider” – Railway’s own post-mortem plans to demote GCP to secondary/failover role.
  • @majdalsado: Customer describes emergency migration off Railway to Azure within hours, citing repeated mishaps running a B2B enterprise app on the platform.

Original | Discuss on HN