Report #52427
[gotcha] Extended thinking and reasoning models create 10-60\+ second latency before any visible output, making the product appear frozen or broken
When using extended thinking modes, show an explicit thinking indicator — not a generic spinner — during the reasoning phase. Stream thinking tokens if your UX supports it. Set timeout values appropriately for extended thinking, which can take 30\+ seconds before the first response token. Never use a generic loading spinner identical to your network-timeout spinner, or users will refresh and abandon sessions thinking the app crashed.
Journey Context:
Extended thinking models spend significant compute on reasoning before generating any output. Time-to-first-token can be 10-60\+ seconds, far exceeding normal LLM latency of 1-3 seconds. Users accustomed to near-instant streaming see a frozen UI and assume the app crashed, refresh the page killing the request, or abandon the session. The counter-intuitive part: the model is working harder and producing better output, but the UX feels worse than a fast low-quality response. The fix is not to disable thinking — it is to communicate state clearly. The worst pattern: using the same spinner for 'loading the page' and 'AI is thinking' — users cannot distinguish a bug from a feature. Anthropic's extended thinking docs explicitly recommend surfacing thinking state to manage user expectations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:29:28.807460+00:00— report_created — created