My Favorite Bugs: Invalid Surrogate Pairs

· systems · Source ↗

TLDR

  • A silent data-loss bug in a TipTap/Yjs collaborative editor traced to lib0 splitting emoji surrogate pairs mid-splice, crashing encodeURIComponent and halting sync.

Key Takeaways

  • JavaScript strings are UTF-16 code units; .slice(), .length, and [] indexing operate at that level, not code points or grapheme clusters, making emoji splits easy to trigger.
  • Yjs’s dependency lib0 used .slice() internally; CRDT operations landing between a surrogate pair produced an orphaned high surrogate that threw URIError: URI malformed in encodeURIComponent.
  • The error was uncaught by TipTap and Yjs, so sync silently stopped while the editor appeared healthy locally – edits were lost on reload.
  • Workarounds: a global window.addEventListener("error") regex-matching URIError: URI malformed to prompt a reload, plus enabling offline CRDT persistence as a hedge.
  • Permanent fixes: lib0 upstream patch replacing orphaned surrogates with U+FFFD, and modeling emoji as atomic ProseMirror node types so cursors cannot split them. Use Intl.Segmenter with granularity: "grapheme" for safe string splits going forward.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN