How to make SSE token streams resumable, cancellable, and multi-device

· ai databases · Source ↗

TLDR

  • SSE-based resumable, cancellable, multi-device LLM token streaming is technically possible but costly, requiring per-token DB writes, cancel markers, and polling workarounds.

Key Takeaways

  • Resumable streams via Last-Event-ID require storing every token in a shared DB, since stateless replicas may route reconnects to a different server.
  • Per-token DB writes create heavy write amplification: each token event carries ~125 chars of metadata for a few chars of text delta, and all tokens are discarded after the full response lands.
  • Cancellations need a separate POST /cancel/{response_id} endpoint writing a cancel marker to shared state; the LLM inference process polls for that marker between tokens.
  • Multi-device support splits into two problems: serving stored tokens to late joiners (solved by DB) and notifying device B of new prompts from device A (not solved by SSE alone, requires polling or long-polling).
  • The author works at Ably and argues a pub/sub transport decouples connection lifetime from agent lifecycle, handles rewind/history, and compacts token deltas automatically.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN