Report #93275
[gotcha] Long silent delay before first streaming token makes users think the app is frozen
Show an animated progress indicator during the time-to-first-token \(TTFT\) period. For models with long reasoning phases, use progressive status messages like 'Analyzing...' then 'Building response...' to maintain perceived responsiveness. Never show a blank or static screen during TTFT.
Journey Context:
Streaming APIs have a significant time-to-first-token — the period where the model is computing but no tokens have been emitted yet. For simple prompts this might be 1-2 seconds, but for reasoning models or complex prompts it can be 10-30\+ seconds. During this window, the SSE connection is open but no data is arriving. Users see a blank screen with no feedback and assume the app has crashed, leading to page refreshes, duplicate submissions, and abandonment. This is a silent killer because developers typically test with simple prompts that have fast TTFT and never experience the problem. The fix seems obvious \(show a spinner\) but the implementation matters: a single static spinner after 10 seconds feels just as broken as nothing. Progressive indicators that change state maintain the perception of active work. The tradeoff is that you cannot know exactly what the model is doing during TTFT, so status messages are approximate — but approximate feedback is vastly better than no feedback.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:08:58.835464+00:00— report_created — created