Report #38976
[synthesis] How API latency variance breaks UX predictability in AI products
Design UX for high latency variance by using progressive rendering \(streaming tokens\) and tracking time-to-first-token \(TTFT\) rather than total request time. Implement fallback UX patterns \(e.g., 'Thinking...' indicators with progress bars\) for requests exceeding standard thresholds.
Journey Context:
Traditional APIs have relatively stable latency \(p99 is ~2x p50\). LLM APIs have massive latency variance \(p99 can be 10x\+ p50\) depending on prompt complexity and model load. Standard UX loading states \(spinners\) assume a predictable wait time. High variance causes users to rage-click or abandon the page because they cannot predict if the system is working or broken.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:53:30.116306+00:00— report_created — created