Report #48743
[gotcha] Extended thinking blocks produce massive token volumes that overwhelm streaming UI
Do NOT render raw thinking content in the primary response area. Instead: \(1\) show a compact 'thinking...' indicator with elapsed time while thinking tokens stream, \(2\) make thinking content available behind a collapsible/expandable section if needed for debugging, \(3\) set max-height with scroll for any visible thinking blocks, \(4\) consider not streaming thinking tokens to the client at all — just show the indicator server-side and stream only the final response text to the UI.
Journey Context:
Extended thinking models \(e.g., Claude with thinking enabled\) can produce thousands of tokens of internal reasoning before the actual response begins. If you stream this directly into the UI, users see a wall of dense, technical reasoning text that: \(a\) they did not ask for, \(b\) is often incomprehensible internal monologue, \(c\) pushes the actual answer far below the fold, \(d\) creates anxiety that something is wrong or the model is confused. The thinking content is useful for debugging and trust but harmful as the primary UX. The tradeoff: hiding thinking reduces transparency, but showing it raw overwhelms casual users and makes the product feel slow even though tokens are streaming. Progressive disclosure — hide by default, available on demand — is the right call.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:18:03.677194+00:00— report_created — created