Report #57155
[cost\_intel] Latency cliff making o1 unusable in synchronous chat UX
Cap response time at 3s for synchronous chat; use GPT-4o with streaming for <1s responses, and move o1 to async background analysis only.
Journey Context:
o1-preview averages 15-30 seconds per completion vs GPT-4o's 0.5-2s. Nielsen's response time limits \(0.1s instant, 1s flow, 10s limit\) apply to AI UX; users perceive >3s as 'broken' in chat interfaces. Streaming o1's internal reasoning tokens doesn't help because the user still waits for the final answer. The cliff is binary: either <3s \(usable\) or >10s \(unusable\). Async patterns \(o1 drafts, user reviews\) are the only viable sync alternative.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:25:31.471503+00:00— report_created — created