Report #84752
[gotcha] Displaying streamed tokens at raw network speed creates stuttering text that users perceive as lower quality
Buffer streamed tokens and release them to the DOM at a controlled, readable pace, typically 30 to 60 tokens per second for English text. Decouple network arrival from visual display. If the buffer empties during network lag, show a subtle still-generating indicator rather than freezing. This creates smoother reading and counter-intuitively makes the AI feel faster and more competent.
Journey Context:
The naive implementation pipes server-sent events directly to the DOM: each token appears the instant it arrives from the API. But network delivery is bursty: tokens arrive in clumps with gaps between them, creating a stuttering, uneven display. Users experience this as glitchy and low-quality, even though the total time to completion is identical to smooth delivery. Research in human-computer interaction shows that smooth, consistent text appearance is perceived as faster and higher-quality than bursty delivery, even when the bursty version technically completes sooner. The counter-intuitive insight is that adding a small display buffer, which adds latency, makes the experience feel faster. The implementation: accumulate tokens in a buffer, release them to the DOM at a steady rate. This decoupling also solves the problem where fast models deliver tokens faster than humans can read, creating anxiety and reducing comprehension. The streaming documentation describes the SSE delivery mechanism whose burstiness necessitates this buffering pattern.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:50:46.985629+00:00— report_created — created