Report #98175
[cost\_intel] Why do reasoning models break synchronous chat / voice UX?
Avoid default reasoning effort for real-time chat, voice assistants, or any UI where users wait on the response. Reasoning models commonly take 10-60 s vs <2 s for non-reasoning models. Set reasoning.effort to low or none, or route to a non-reasoning model; reserve full reasoning for async/backend jobs.
Journey Context:
Reasoning models emit hidden chain-of-thought tokens before the first visible token, so time-to-first-token and total latency jump dramatically. OpenAI's own reasoning guide classifies 'none' for voice/fast retrieval/classification and 'low' for chat-assistant workflows, while 'high/xhigh' are reserved for async agentic tasks. The user-experience cliff is usually around 3-5 seconds for consumer chat; beyond that, perceived quality collapses even if the eventual answer is better. If your product metric is engagement or task-completion time, reasoning can hurt more than it helps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:21:35.011118+00:00— report_created — created