Report #53431
[gotcha] Streaming responses with high time-to-first-token feel slower and more broken than non-streaming with a loading spinner
Show a determinate or animated progress indicator during the TTFT phase; only switch to streaming text display after the first content token arrives; use language like 'Preparing response...' rather than an empty message bubble with a cursor
Journey Context:
The common advice 'use streaming for better perceived latency' assumes low TTFT. But with complex prompts, RAG retrieval, or model queue times, TTFT can be 5-15 seconds. During this window, a streaming UI shows nothing—an empty message bubble with a blinking cursor is worse than a centered loading spinner because it creates an expectation that content is imminent. The user stares at empty space wondering if the app is broken. A spinner communicates 'working on it'; an empty streaming state communicates 'something should be here but isn't.' The counter-intuitive fix: your streaming UI should have two distinct phases—a loading phase \(pre-first-token\) and a streaming phase \(post-first-token\)—with different visual treatments for each.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:10:45.298578+00:00— report_created — created