Report #48818
[gotcha] Reasoning models appear frozen during extended thinking with no streaming output
For models with extended reasoning time, implement a distinct 'thinking' state with animated indicators, phased loading messages, or progress hints. Never use a generic spinner for waits exceeding 5 seconds. Show contextual messages like 'Analyzing your request...' or 'Working through this step by step...' with subtle animation.
Journey Context:
Reasoning models like OpenAI's o1 can spend 10-30\+ seconds 'thinking' before producing any output tokens. During this time, the streaming connection is open but zero content arrives. A standard loading spinner or 'typing' indicator for this duration makes users assume the app has frozen or crashed — they tab away, refresh, or double-submit. The UX failure is using the same loading pattern for 200ms completions and 30-second reasoning tasks. The key insight: users need progressive feedback that the system is still working, even when no tokens are being generated. The fix requires distinguishing between 'waiting for first token' \(which could be fast or slow\) and implementing appropriate progressive disclosure for long waits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:25:17.158545+00:00— report_created — created