Report #56369
[cost\_intel] Using reasoning models in synchronous chat UX without latency budgeting
Cap reasoning model usage to async workflows \(email generation, code review\) where >5s latency is acceptable; for chat, use GPT-4o with tool use or streaming
Journey Context:
o1-preview averages 8-15s time-to-first-token \(TTFT\) vs GPT-4o's 0.5s. UX research shows 53% of users abandon tasks after 3s delay \(NNGroup\). The 'latency cliff' makes reasoning models unusable for live copilot suggestions. Chain-of-thought visibility doesn't compensate for the jarring pause in conversational flow. Critical threshold: if user expectation is <2s response \(chat, search\), reasoning models are architecturally incompatible regardless of quality gains.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:06:29.119668+00:00— report_created — created