Developer Tools

OpenAI's WebRTC problem

WebRTC's aggressive packet dropping degrades AI voice prompts, says veteran engineer.

Deep Dive

A certified WebRTC expert who rewrote the protocol at Twitch and later at Discord has penned a scathing critique of OpenAI's technical choice to use WebRTC for their Voice AI product. The engineer claims that WebRTC's core design is optimized for real-time human conversations where occasional packet drops are acceptable, but this is disastrous for AI voice interactions. Users pay for accurate responses, yet WebRTC aggressively drops audio packets under poor network conditions to keep latency low. This means a user's prompt can be garbled, leading to incorrect AI replies. The irony, the author notes, is that OpenAI adds artificial latency to smooth delivery, then WebRTC drops those same packets anyway.

Further compounding the problem: text-to-speech generation is now faster than real-time, meaning audio could be buffered locally to survive network blips. But WebRTC's jitter buffer is tiny (20-200ms) and has no mechanism for retransmission of lost audio packets in browsers—Discord tried and failed to enable NACKs. The expert warns that copying OpenAI's architecture is a mistake. Instead, voice AI services should use protocols that allow buffering and retransmission, prioritizing prompt integrity over conversational latency. The post also touches on the complexity of WebRTC's ~45 RFCs and its reliance on de-facto standards like TWCC and REMB.

Key Points
  • WebRTC drops audio packets to maintain low latency, but for AI voice prompts accuracy matters more than speed.
  • OpenAI adds artificial latency before sending packets, then WebRTC drops them anyway—introducing a double degradation.
  • TTS generates faster than real-time, but WebRTC's tiny jitter buffer (20-200ms) prevents buffering that could mask network jitter.

Why It Matters

Voice AI developers should avoid copying OpenAI's WebRTC stack to prevent degraded user experience from packet loss.