OpenAI's WebRTC Problem

· ai · Source ↗

A veteran who built WebRTC SFUs at both Twitch and Discord argues WebRTC is fundamentally wrong for voice AI—and lays out why QUIC is the replacement builders should be planning toward.

What Matters

  • WebRTC aggressively drops audio packets to minimize latency; retransmission is impossible in-browser, making garbage prompts and garbage LLM responses inevitable.
  • TTS generates audio faster than real-time (2s GPU time → 8s audio), but WebRTC’s no-buffering, arrival-time rendering forces OpenAI to inject artificial sleep delays before every packet.
  • Establishing a WebRTC connection requires a minimum of 8 RTTs; QUIC needs 1 RTT for the full QUIC+TLS handshake.
  • OpenAI’s load balancer uses Redis to map source IP/port → backend; QUIC-LB encodes the backend server ID directly into the CONNECTION_ID, eliminating shared state entirely.
  • WebRTC’s per-connection ephemeral port model breaks at scale: port limits, firewall blocking, and Kubernetes incompatibility force everyone to mux onto a single port and break the spec.
  • Discord forked WebRTC so aggressively that native clients implement almost none of the original stack—but web clients still require the full ~45-RFC implementation.
  • Near-term pragmatic recommendation: stream audio over WebSockets; graduate to QUIC/WebTransport when packet-drop or video is needed.
  • [HN: @awkii] Calls WebRTC one of the worst protocols to implement—SDP, TURN/STUN, ICE, and the full handshake must be rebuilt from scratch each time.
  • [HN: @giancarlostoro] WebTransport is the lesser-known alternative to WebRTC that most builders haven’t evaluated.

Original | Discuss on HN