Report #94701
[gotcha] Why do users catch fewer hallucinations when AI responses stream in token-by-token vs appearing all at once
Add a distinct post-stream review state. Disable primary actions \(copy, submit, execute, share\) until the full response is received. Use a visual transition from generating to complete that signals the user should now evaluate the output, not just consume it.
Journey Context:
Streaming was adopted to improve perceived latency, but it triggers the fluency heuristic from cognitive science: smoothly processed information is perceived as more truthful. When text streams in fluidly, the user's brain interprets ease of processing as a signal of accuracy, making hallucinations harder to catch. The same incorrect fact that would stand out in a static response blends in during streaming. This is deeply counter-intuitive because streaming genuinely feels like better UX. The tradeoff is between perceived responsiveness and accuracy of user evaluation. Simply disabling streaming would hurt engagement. The right call is to keep streaming for engagement but add a clear boundary between the AI is still talking and now evaluate what it said, so users switch from consumption mode to evaluation mode.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:32:22.343371+00:00— report_created — created