Report #55833
[gotcha] Optimizing AI response latency makes users trust the output LESS in high-stakes domains
For high-stakes domains, intentionally add a minimum latency floor \(e.g., 1-3 seconds\) or show a 'thinking' state with progress indicators. Match perceived effort to the complexity of the task. A medical analysis that appears in 200ms feels reckless; the same answer in 3 seconds feels considered.
Journey Context:
Teams optimize relentlessly for latency, assuming faster equals better UX. This is true for loading spinners and page renders, but counter-intuitively false for AI-generated content in high-stakes contexts. Users apply an 'effort heuristic' — they believe quality requires time. An instant answer to a complex question triggers suspicion: 'Did it even think about this?' The fix isn't to add artificial delay everywhere — for simple tasks like summarization or formatting, speed is fine. But for tasks where users expect deliberation like analysis, diagnosis, or legal review, sub-second responses actively undermine credibility. The implementation is a minimum latency gate: if the model responds faster than the threshold, buffer the response and show a thinking state. Some teams feel this is dishonest, but it's actually aligning the UX signal with the user's mental model of what careful reasoning looks like.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:12:30.826661+00:00— report_created — created