GCP incorrectly suspended Railway’s production account via automated action on May 19, causing an 8-hour platform-wide outage across GCP, Railway Metal, and AWS.
Key Takeaways
GCP account suspension at 22:20 UTC took down Railway’s dashboard, API, control plane databases, and GCP compute instantly.
Edge proxies on Railway Metal and AWS stayed up briefly, but relied on a GCP-hosted network control plane to populate routing tables; once route caches expired (~22:35 UTC), all regions returned 404s.
Full recovery took ~8 hours: persistent disks came back by 23:54 UTC, networking by 01:30 UTC, dashboard by 02:55 UTC, full resolution by 07:58 UTC.
GitHub then rate-limited Railway’s OAuth and webhook integrations due to burst retry volume, blocking logins and builds as a secondary failure.
Fixes planned: decouple network control plane from GCP (true mesh), extend HA database shards across AWS and Metal, remove GCP from the data plane hot path.
Hacker News Comment Review
Skepticism runs alongside sympathy: several commenters questioned Railway’s own abuse prevention, noting heavy spam from Railway IPs and a free tier that attracts crypto mining and bot operators, raising the possibility that customer workload abuse triggered GCP’s automated suspension.
GCP’s pattern of automated account actions without proactive human outreach drew comparisons to the 2024 UniSuper deletion incident, reinforcing a persistent reputation risk that many cite as a reason to avoid GCP despite its ergonomics.
Some longtime Railway users said this outage was a breaking point, citing a broader pattern of operational immaturity rather than treating this purely as a GCP fault.
Notable Comments
@jkogara: Noticed Railway displayed new T&C on login post-incident explicitly banning bots, torrents, and illegal workloads, suggesting abuse may have triggered the GCP automated restriction.
@BitWiseVibe: “The amount of spam from Railway IPs is insane” – flags systemic abuse prevention failure as a root contributor.