Report #65291
[gotcha] Streaming AI output in real-time means users see and act on incorrect early tokens before the model can self-correct
For high-stakes outputs \(code, data, medical, legal\), buffer a short window before streaming \(100-300ms or the first complete sentence\), show a 'generating...' indicator during this buffer, and always provide a prominent 'stop generating' control. Never auto-apply or auto-execute streamed code output until generation completes and is validated.
Journey Context:
Streaming is the default for chat UX because it feels responsive. But streaming creates an irrevocable commitment: once the user sees a token, they've processed it. If the model starts down a wrong path and then self-corrects \('Actually, let me reconsider...'\), the user has already internalized the wrong information. This is especially dangerous for code generation where users copy-paste partial output into terminals. The naive approach streams everything immediately with no guardrail. The fix is risk-calibrated streaming: for casual chat, stream freely; for code, data, or instructions, buffer enough to detect coherence, and never auto-execute partial streamed output. Anthropic's streaming best practices explicitly recommend giving users control over the generation process.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:04:16.478846+00:00— report_created — created