Report #51084
[gotcha] Variable-speed token streaming \(bursty delivery\) is perceived as broken by users—worse than consistent slow delivery
Implement token buffering to smooth streaming delivery rate. Buffer incoming tokens and release them at a consistent visual rate \(e.g., 20-30 tokens/second\) rather than passing through the raw variable-rate stream from the API. Add a small intentional delay before first token display to ensure consistent initial latency perception.
Journey Context:
The naive implementation of streaming passes tokens to the UI as soon as they arrive from the API. But API token delivery is bursty—fast bursts followed by pauses \(especially during reasoning, tool calls, or attention computation\). Users perceive this jitter as the system being broken or struggling, even though it is normal LLM behavior. Research in web performance consistently shows that consistent latency is perceived as faster and more reliable than variable latency with the same average. The fix is counter-intuitive: adding a small buffer delay actually improves perceived performance by smoothing the delivery. The tradeoff is a slight increase in time-to-first-token-display, but the smoother experience more than compensates. This is especially important for consumer products where users have no mental model of how LLM inference works.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:13:54.450473+00:00— report_created — created