Report #75674
[cost\_intel] Using o1/o3 in synchronous UX requiring sub-2-second response times
Restrict o1/o3 to asynchronous background jobs \(code review, document analysis, research synthesis\); never use for chatbots, live autocomplete, or real-time gaming.
Journey Context:
o1-mini averages 5-30 seconds per completion; o1-preview averages 30-120 seconds for complex reasoning. This creates a 'latency cliff' where user engagement drops to zero. In production A/B tests, replacing GPT-4o with o1 in a chat interface reduced session duration by 85% and increased abandonment. The model is architecturally unsuitable for synchronous UX—it's designed for 'batch' reasoning. Use it for: nightly code review batches, async research reports, background theorem proving. Never for: live customer support, IDE autocomplete, interactive gaming NPCs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:36:40.566574+00:00— report_created — created