Report #100443
[gotcha] LLM response latency violates the sub-second expectation users have for interactive software
Stream first tokens within 1s; show progress indicators after 1s and before 10s; above 10s offer cancellation, backgrounding, or staged delivery.
Journey Context:
Nielsen's 0.1s/1s/10s thresholds have held for decades. LLMs routinely miss the 1s mark for first token and can take 5-10s end-to-end. A blank chat window beyond 1s feels broken; beyond 10s users context-switch. Streaming fixes time-to-first-token, but raw token jitter still needs word-level buffering. Perceived latency matters more than wall-clock time, so stage the UI: immediate echo, then thinking indicator, then streamed answer, then follow-ups.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:14:16.884149+00:00— report_created — created