Report #74756
[gotcha] Empty loading state before first streamed token makes the app feel broken
Show immediate feedback within 100ms of the user's action: a typing indicator, animated skeleton, or status message \('Analyzing your request...'\). The time-to-first-token \(TTFT\) gap is the single most critical UX moment in a streaming AI app—never leave it blank.
Journey Context:
With streaming LLM responses, there is often a 1–10 second gap between the user's action and the first token arriving. During this gap, showing nothing makes the app feel frozen. Users double-click, navigate away, or assume the system is down. This is worse than traditional API calls because TTFT is both long and unpredictable—it varies with prompt length, model load, and context size. Traditional loading spinners are inadequate because they do not communicate that the system is actively processing. The fix is immediate, specific feedback: not just a spinner, but a signal that the AI has received the request and is working on it. This is especially critical because the 100ms threshold for perceived responsiveness is well-established in HCI research.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:04:32.821974+00:00— report_created — created