Report #20800
[gotcha] Streaming AI responses create premature user commitment and false confidence in partial output
Buffer an initial meaningful chunk \(50-100 tokens\) before streaming to the user. Visually distinguish 'still generating' from 'generation complete' states. For code output, disable copy/apply/execute actions until generation finishes with finish\_reason 'stop'. Add a subtle visual indicator \(pulsing border, dimmed opacity\) that resets only on confirmed completion.
Journey Context:
Token-by-token streaming was designed to reduce perceived latency, but it triggers confirmation bias: users form a hypothesis from the first few tokens and become less critical of the rest. This is amplified by authority bias — the gradual reveal maps to human typing, which users unconsciously associate with deliberate thought. In code generation, early tokens lock in the approach \(library choice, algorithm, variable names\), and later tokens that diverge feel wrong even when they're corrections. The fix isn't to stop streaming \(latency perception matters\), but to prevent users from acting on incomplete output. Buffering the first chunk ensures the initial impression is coherent rather than misleadingly partial. Disabling actions until finish\_reason='stop' prevents the most dangerous outcome: executing half-generated code.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:19:32.951551+00:00— report_created — created