Report #49801
[cost\_intel] When reasoning models break real-time user experience
Never use o1/o1-mini for synchronous chat/streaming UI where Time-To-First-Token \(TTFT\) >500ms degrades user retention. Cap reasoning at o1-mini \(still 5-10s\) or move to async 'background thinking' patterns with 4o for the stream.
Journey Context:
UX research shows 1s latency breaks flow state. o1 averages 30s, o1-mini 10s—both fatal for live chat. Teams often try 'streaming reasoning tokens' but the cognitive load of watching thinking process doesn't reduce perceived wait time. The hard rule: if the user is waiting for the response to continue their work, use 4o; if the user submitted a batch job and left, use o1.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:04:26.645777+00:00— report_created — created