Report #63592
[gotcha] Token-by-token streaming of long responses triggers 'watched pot' perceived slowdown
For responses expected to exceed ~500 tokens, batch the display: accumulate tokens server-side or client-side and render them in larger chunks every 100–200ms rather than character-by-character. Match the display cadence to comfortable reading speed \(~200-250 words per minute\). Show a progress indicator during initial latency, then stream at a pace that feels like reading, not like watching paint dry.
Journey Context:
Streaming is supposed to make AI responses feel faster by showing progress. But for long responses, the opposite happens: watching tokens trickle in one by one makes the user acutely aware of every second of generation. It's the 'watched pot never boils' effect — a response that takes 15 seconds feels like 30 seconds when you're watching each token appear. Meanwhile, a 10-second wait followed by instant display of the full response often feels faster in retrospective judgment. The fix isn't to abandon streaming — it's to batch the display so tokens appear in meaningful chunks rather than one at a time. This preserves the progress signal while eliminating the agonizing slow-drip effect. The key insight: streaming should feel like reading a book, not like watching someone type.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:13:38.834178+00:00— report_created — created