Report #50445
[gotcha] Reasoning models create extended silent periods with no streaming tokens causing users to abandon or reload
For reasoning models, show an animated thinking state with a time-elapsed indicator. Set user expectations upfront about potential wait times of 10–60\+ seconds. Never use a static spinner — use progressive indicators that confirm the system is actively working. Consider implementing a cancel mechanism.
Journey Context:
Standard chat models stream tokens immediately, giving instant feedback. Reasoning models spend significant time — often 10–60\+ seconds — on internal reasoning before emitting any output tokens. During this period the API returns no content, only keep-alive signals. A static spinner for 30\+ seconds looks like a frozen system. Users refresh, double-submit, or abandon. The fix is progressive feedback: elapsed time counters, animated states, and upfront expectation-setting. The tradeoff is that you cannot show what the model is thinking because reasoning tokens are hidden by design, so you must substitute process indicators for content indicators. This is fundamentally different from the streaming UX patterns most developers are accustomed to and requires deliberate design rather than reusing a chat streaming component.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:09:28.261148+00:00— report_created — created