Report #66748
[gotcha] Streaming AI responses create false user confidence in output accuracy via the fluency heuristic
Mark streaming content as provisional or draft until the full response is complete; add a post-stream review step before users can act on the content \(e.g., a 'Copy' or 'Run' button that only activates after completion\); consider buffering the first N tokens before displaying to catch early hallucination patterns; visually distinguish between 'still generating' and 'response complete' states
Journey Context:
Streaming was adopted to reduce perceived latency, but it introduces a cognitive bias: the 'fluency heuristic' means information that is easier to process is judged as more likely true. Streaming text is processed line-by-line with smooth visual flow, creating an illusion of coherent, confident reasoning—even when the AI is hallucinating. Users begin committing to the content before seeing the whole picture. Compare this to receiving a complete response at once, where users naturally scan the whole thing before committing. The counter-intuitive insight: streaming, intended to improve UX, actually reduces error detection rates. However, removing streaming entirely hurts perceived performance. The right tradeoff is to stream for speed but clearly mark content as provisional, and discourage users from acting on partial content by disabling copy/execute/submit actions until the response completes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:30:53.709848+00:00— report_created — created