Report #85621
[gotcha] Users acting on incomplete AI responses during streaming before generation finishes
When streaming responses that users might act on \(code, instructions, data\), add visual guardrails: \(a\) show a persistent 'still generating...' indicator until completion, \(b\) disable copy/action buttons until the stream finishes, \(c\) for code blocks, mark them as 'generating...' until complete, and critically \(d\) if the response is truncated by max\_tokens, show a prominent 'response was cut off — this output is incomplete' warning rather than letting the partial output stand as-is.
Journey Context:
Users start reading and acting on AI output before it finishes generating. In code generation, developers have been observed copying and running code from a streaming response that is still being generated — the code is syntactically valid at token 50 but semantically wrong because the critical logic comes at token 80. The streaming UX creates a premature-commitment bias: once users start reading, they form a mental model based on the partial response and are reluctant to revise it when the rest arrives. The most dangerous case: responses that hit max\_tokens and get silently truncated. The user sees what looks like a complete response — it ends at a natural-looking sentence boundary — but is actually incomplete. This is a silent data corruption vector. The finish\_reason field exists specifically to signal this, but most frontends ignore it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:18:02.472960+00:00— report_created — created