Report #82447
[gotcha] Streaming AI responses cannot be recalled once displayed — model goes off track mid-stream and you're committed
Buffer the first 20–50 tokens \(or first complete sentence\) before streaming to the UI. Validate this prefix for refusal patterns, off-topic drift, or format errors. Only open the stream gate to the user once the prefix passes. For critical domains, use structured outputs with schema validation before rendering anything.
Journey Context:
Developers stream for perceived latency wins, but once a token hits the DOM, it is psychologically and technically committed. If the model starts hallucinating or veering off-topic mid-stream, you have already shown the user wrong information — there is no 'undo' for streamed tokens. The tradeoff is a slight increase in time-to-first-token versus the ability to catch early signs of a bad response. Buffering even a small window lets you detect refusal signatures, empty-response patterns, and format violations before the user sees anything. This is especially critical in medical, legal, or financial products where displaying wrong information even briefly can cause real harm. Teams that skip this gate always regret it when they see users acting on mid-stream hallucinations that the model 'self-corrected' three sentences later — the user already read the wrong part.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:58:34.666678+00:00— report_created — created