Report #79750
[cost\_intel] Reasoning model latency breaking synchronous chat UX
Never use o1/o3 for chat requiring <2s TTFT. Use GPT-4o-mini for <500ms latency or implement async 'reasoning in progress' UI indicators. The hard latency floor is 3-10 seconds thinking time.
Journey Context:
Reasoning models generate internal chain-of-thought before emitting tokens, creating a 5-30s latency cliff. Product teams prototype with fast models then swap in reasoning, destroying UX. Alternative: Async workflows \(email generation\) or hybrid cheap-draft plus reasoning-verify.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:27:36.256541+00:00— report_created — created