Report #96157
[cost\_intel] When does o1's 30-second thinking time break synchronous user experience?
Never use full o1/o3 in synchronous chat UX \(>2s latency budget\); use o1-mini for <10s tolerances, or switch to async 'background thinking' pattern with 4o for immediate ACK.
Journey Context:
Anthropic's research on building effective agents establishes a 1-2 second latency cliff for synchronous chat UX; beyond this, user perception shifts from 'responsive' to 'delayed'. o1-preview averages 15-30s thinking time \(up to minutes for hard prompts\). This creates a 'latency-cost valley': you pay 30x for a UX degradation. Common mistake is upgrading 4o → o1 in a chatbot expecting faster/better answers; users abandon. The degradation signature is 'thinking...' spinners lasting >10s. The solution is architectural: use 4o for immediate streaming response, offload hard reasoning to async o1 job, or use o1-mini \(3-5s\) for medium complexity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:58:46.301839+00:00— report_created — created