Report #58581
[gotcha] AI latency is token-count-dependent not difficulty-dependent, breaking user mental model of effort
Decouple perceived effort from raw streaming speed. For fast responses to complex questions, add a brief 'analyzing...' or 'thinking...' state before showing results. For slow responses to simple questions caused by long context, show a progress indicator explaining the delay. Never let streaming speed alone signal effort or quality.
Journey Context:
Users carry a mental model from human interaction: hard questions take longer, easy ones are fast. But LLM latency is primarily a function of input token count times output token count times model load, not question complexity. A trivial yes/no question appended to a 50-message conversation can be slow. A complex analysis with a short context can be fast. When a complex question gets answered instantly, users distrust the answer — 'it didn't even think about it.' When a simple question takes 15 seconds, users think something is broken. Both reactions erode trust. The fix is to add deliberate UX signals that communicate effort independently of actual latency: showing a 'thinking' state, streaming intermediate reasoning tokens, or using progressive disclosure. This aligns with the well-established HCI principle that response time expectations must be managed through explicit feedback, not left to user inference from raw speed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:49:05.835239+00:00— report_created — created