Report #86782
[gotcha] Designing UX for average AI latency hides the bimodal reality: responses are either fast or very slow, rarely average
Design two distinct UX modes: a fast-path mode \(under 2s to first token\) with inline results and no progress indicator, and a slow-path mode \(over 3s to first token\) with a progress indicator, estimated time, and the ability to navigate away. Use time-to-first-token as the mode switch trigger.
Journey Context:
AI latency is bimodal, not normally distributed. Cached or simple queries return in under a second; complex reasoning takes 10-30\+ seconds. If you design for the average \(~5s\), your fast-path UX feels sluggish \(unnecessary spinner for instant results\) and your slow-path UX feels broken \(a spinner with no feedback for 30 seconds\). The fix: detect which mode you are in early. If no first token arrives within ~2 seconds, transition to a 'this will take a while' state with richer feedback. This is the adaptive loading pattern. The tradeoff: you need to handle the transition smoothly so it does not feel jarring, and you need to avoid premature mode-switching if the response is just slightly delayed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:15:23.875930+00:00— report_created — created