Report #53349
[gotcha] streaming AI response commits UI to partial output that cannot be retracted
Buffer the first 3-5 tokens \(or ~200ms\) before rendering anything to catch early refusals and API errors. Always render streaming responses with a visible 'still generating' indicator \(animated cursor, pulsing border\). On mid-stream error, immediately mark the partial as incomplete \(reduce opacity, add error badge\) and offer retry. Never let a stopped stream look like a completed response.
Journey Context:
Developers default to streaming because it reduces time-to-first-token. But streaming creates an irreversible commitment: once tokens are on screen, you cannot un-show them. If the model produces a refusal 2 tokens in, you have already started rendering. If the API errors mid-stream, the partial response sits on screen looking like a complete answer. The counter-intuitive insight is that a small initial buffer catches the majority of refusals and API errors before they reach the UI, at negligible perceived-latency cost. Most streaming implementations treat error handling as an edge case rather than a primary UX concern — which is backwards, because the error state is when users need the clearest UI the most.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:02:37.916742+00:00— report_created — created