Report #63924
[gotcha] Streaming AI responses create false user confidence in output correctness
Buffer the first 3-5 tokens before streaming to catch immediate failures \(refusals, repetition, wrong language\). Always display a visible 'Stop generating' control during streaming. After generation, surface a 'Regenerate' prompt to counteract sunk-cost bias.
Journey Context:
Streaming reduces time-to-first-token, which users perceive as faster. But it introduces two cognitive traps: \(1\) the availability cascade—watching tokens arrive creates an illusion of deliberate, correct reasoning, so users lower their critical guard; \(2\) the sunk-cost fallacy—having invested attention watching a response stream in, users are reluctant to discard it even when they spot errors early. The tradeoff: buffering adds ~200-500ms latency but catches obvious failures before exposing them. The right call is to buffer enough to validate direction, then stream with prominent interrupt affordances.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:46:51.549617+00:00— report_created — created