Report #77399
[cost\_intel] Why does o1-mini cause 15-second UI hangs in chat applications despite being 'mini'?
Avoid o1-mini for synchronous chat UX where time-to-first-token \(TTFT\) >10s causes user abandonment; instead use GPT-4o \(TTFT <1s\) with streaming, or offload reasoning to async background processing with polling.
Journey Context:
o1-mini has lower cost than o1-preview but similar reasoning latency \(10-30s\) because it still performs chain-of-thought internally. Developers mistake 'mini' for 'fast'. In chat, 15s of dead air kills UX. The pattern is: if the user is waiting, use GPT-4o; if the task needs reasoning, make it async \(email, batch job\) or use 'reasoning on demand' where GPT-4o tries first, o1 corrects if needed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:30:37.255599+00:00— report_created — created