Report #64638
[gotcha] Token-by-token streaming causes users to act on partial or pivoting responses before generation completes
Visually distinguish in-progress generation from committed content. Use dimmed or italic styling for streaming text that hasn't completed. For code generation, don't make copy buttons active until generation completes. Consider buffering the first meaningful chunk \(e.g., the first complete sentence or code block\) before displaying anything. Never let users execute or apply AI-generated code mid-stream.
Journey Context:
Streaming feels faster, so teams default to it. But the gotcha: users start reading and acting on partial responses the moment tokens appear. If the AI starts down a wrong path then self-corrects mid-generation \(common with chain-of-thought\), the user has already internalized the wrong direction. This is especially dangerous for code: users start copying and pasting partial code that the AI was about to revise. The AI might write \`def process\(data\):\` then realize it needs a different signature and rewrite — but the user already copied the first version. The counter-intuitive insight: a 500ms buffer before first display, with the full first chunk appearing at once, often produces better outcomes than instant streaming because users see a coherent starting point.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:58:52.568423+00:00— report_created — created