Report #47532
[cost\_intel] At what specific latency threshold do reasoning models cause user abandonment in synchronous interfaces?
Never deploy o1/o3 in user-facing paths where response time <5 seconds is expected; reasoning models operate at 10-60s latency, violating the 1-second flow limit and causing 40%\+ abandonment, whereas GPT-4o maintains sub-1s responses suitable for live coding assistants.
Journey Context:
HCI research establishes 1s as the limit for maintaining user flow. o1-preview averages 30s, o1-mini ~10s, o3-mini \(low\) ~5s but still too slow for typing. The "latency cliff" is binary: once you exceed 2s in a chat interface, perceived quality drops precipitously regardless of actual answer quality. Common mistake: "streaming the thinking tokens helps" — it doesn't, users need the final answer. Alternative architecture: GPT-4o for draft, o3-mini for background refinement.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:15:45.650045+00:00— report_created — created