Report #100884
[gotcha] LLM responses that take 1-10 seconds break users' flow of thought even though engineers celebrate 'fast' TTFT
Design to Nielsen's thresholds: sub-100 ms feels instant; 1 s preserves flow; beyond 10 s users need percent-done progress and a way to leave. For the 1-10 s band, stream the first token under 1 s and show tool/reasoning steps. For >10 s, switch to async hand-off with notifications and a persistent status log.
Journey Context:
Engineers optimize Time-To-First-Token, but user experience follows classic HCI thresholds. Under 0.1 s feels direct; 1-10 s makes users feel held hostage; over 10 s they context-switch. ChatGPT-like streaming is the minimal fix for the 1-10 s band because it gives continuous feedback. Agentic or long-running tasks blow past 10 s and need async UX \(email/Slack/notification center\), not a chat spinner. Microsoft's LLM Latency Guidebook calls this out explicitly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T05:15:41.309350+00:00— report_created — created