Artemis II Fault Tolerance

· design · Source ↗

TLDR

  • NASA’s Orion spacecraft runs flight software across eight CPUs in fail-silent pairs, surviving loss of three of four Flight Control Modules mid-flight.

Key Takeaways

  • Two Vehicle Management Computers each hold two Flight Control Modules; each FCM is a self-checking processor pair, totaling eight parallel CPUs.
  • A silenced FCM resets, re-synchronizes state with live modules, and rejoins the group without requiring a full restart.
  • Triple-modular-redundant memory self-corrects single-bit errors on every read; dual-lane network interface cards catch bit flips before they reach command output.
  • The network itself is triple-redundant across three separate planes with self-checking switches at every node.
  • A dissimilar Backup Flight Software system runs on different hardware, a different OS, and independently developed code to guard against common-mode software failures.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN