No way to parse integers in C (2022)

· systems · Source ↗

TLDR

  • Every C standard library integer parsing function (atol, strtol, strtoul, sscanf) has correctness or error-detection failures; only std::from_chars in C++ works reliably.

Key Takeaways

  • atol silently returns wrong values on overflow and trailing garbage; POSIX declares overflow behavior undefined, making it unsafe on untrusted input.
  • strtol can be used correctly for signed types with careful errno and endptr checks, but requires boilerplate most callers skip.
  • strtoul and sscanf silently wrap negative inputs to large positive values with no detectable error, making unsigned parsing fundamentally broken.
  • std::from_chars (C++17) explicitly rejects leading minus on unsigned types and reports errors via std::errc, making it the only standard option that works correctly.
  • A 2023 workaround: call strtol first to reject negatives, then call strtoul on the validated input.

Hacker News Comment Review

  • Commenters broadly agree this is long-known tribal knowledge in C; the common attitude is “just bring your own string library” rather than expecting stdlib to be fixed.
  • OpenBSD’s strtonum was cited as the practical escape hatch that handles these edge cases with a saner API.
  • Debate surfaced over whether the brokenness is a C design flaw or an inherent consequence of prioritizing minimal runtime with no error-propagation convention.

Notable Comments

  • @orthoxerox: Describes a semester-long course assignment where students discover edge cases incrementally, ending with a billion nines piped to stdin.
  • @jervant: Points directly to strtonum on OpenBSD as the practical solution stdlib never standardized.

Original | Discuss on HN