Railway GCP Account Suspension Incident Report

· cloud · Source ↗

TLDR

  • Google Cloud incorrectly suspended Railway’s production GCP account on May 19, 2026, causing an ~8-hour platform-wide outage across GCP, Railway Metal, and AWS.

Key Takeaways

  • GCP automated action suspended Railway’s account without proactive notice, taking down the dashboard, API, control plane, and all GCP-hosted compute.
  • Railway’s edge proxies cache routing tables from a GCP-hosted control plane; once caches expired, Metal and AWS workloads also returned 404s despite remaining online.
  • Recovery was staged and slow: persistent disks, compute instances, and networking each required separate restoration, extending the outage several hours past account reactivation.
  • GitHub rate-limited Railway’s OAuth and webhook integrations during recovery due to burst retry volume, blocking logins and builds as a secondary cascade.
  • Immediate fixes: decouple route discoverability from GCP control plane to form a true mesh, spread HA database shards across AWS and Metal, and move GCP off the data plane hot path.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN