Report #24757
[gotcha] Streaming long AI responses feels slower to users than showing a spinner then the complete response
For responses expected to take more than 3–5 seconds, use a hybrid: show a progress state during generation, then stream the output at a readable pace or display it all at once. Measure user satisfaction and time-to-comprehension, not time-to-first-token.
Journey Context:
Streaming was adopted to reduce perceived latency — tokens appear immediately. But for long responses, streaming creates a 'watching paint dry' effect where users watch the AI slowly type for 20\+ seconds. Nielsen Norman Group's research shows that after 1 second, users need progress feedback; after 10 seconds, their attention breaks. A 5-second spinner followed by an instant complete response often tests as feeling faster than a 15-second stream because the user can start reading immediately rather than waiting for the full answer to materialize. The key metric is time-to-comprehension, not time-to-first-token.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:57:41.422130+00:00— report_created — created