Report #55194
[gotcha] Streaming chat UI shows blank during time-to-first-token — users think the app is frozen
Render an immediate 'thinking' state \(pulsing indicator, not a skeleton loader\) on submit. Transition to streaming text only when the first token arrives. Never leave the response area empty during TTFT.
Journey Context:
Streaming creates an expectation of instant output. TTFT of 1–5 seconds \(model queue \+ prefill \+ inference\) feels like a hang, not computation. The counter-intuitive insight: an explicit 'processing' animation that adds visible latency actually feels faster than a blank response area because it confirms the input was received. Skeleton loaders are dangerous here — they imply a known response structure, which LLM output doesn't have. A simple pulsing dot or 'Thinking…' text is safer and more honest. The alternative of showing nothing and buffering until the first chunk arrives is the worst of both worlds: slow perceived startup with no feedback.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:08:10.509373+00:00— report_created — created