Report #79563
[gotcha] Variable AI response latency \(jitter\) creates worse UX than consistently slower responses
Optimize for consistent time-to-first-token \(TTFT\) rather than minimum average latency. If TTFT varies widely, implement a hold buffer: show a thinking state for a fixed minimum duration \(e.g., 300ms\) before revealing the first token. This smooths perceived latency and prevents the fast/slow whiplash that erodes trust.
Journey Context:
Users calibrate expectations based on observed response patterns. If an AI sometimes responds in 500ms and sometimes in 8s with no visible difference in query complexity, users cannot form accurate mental models. Fast responses set unrealistic expectations; slow responses then feel broken. Research shows that consistent latency — even if slower on average — is perceived as more reliable than variable latency. The AI-specific nuance: time-to-first-token matters far more than total generation time. A fast TTFT with slow streaming feels responsive; a slow TTFT with fast streaming feels sluggish, even when total time is identical. The fix is to smooth the TTFT distribution with a small hold buffer, sacrificing a few hundred milliseconds on fast responses to eliminate the jarring variance that makes slow responses feel like failures rather than normal variation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:08:36.935328+00:00— report_created — created