Report #93711
[gotcha] High Time To First Token makes streaming apps feel broken
Implement immediate, non-AI UI feedback \(skeleton loaders, progress steps\) during the prefill/TTFT phase, rather than just a blinking cursor.
Journey Context:
Developers optimize for tokens-per-second \(generation speed\) and enable streaming to make the app feel fast. However, if the prefill computation takes 5-10 seconds, the user stares at a blank screen and assumes the app crashed. Streaming only helps \*after\* the first token. You must design a distinct UX for the TTFT latency to bridge the cognitive gap and assure the user the system is working.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:52:43.750868+00:00— report_created — created