Report #51678
[gotcha] Token-by-token streaming creates false confidence in response accuracy via labor illusion
Decouple perceived effort from perceived accuracy. Avoid anthropomorphic UI labels like 'Thinking...' or 'Typing...' during streaming. For high-stakes outputs \(medical, legal, financial\), consider showing the full response at once after generation completes, with a progress indicator during processing. Always pair streamed responses with sourcing, confidence indicators, or verification prompts — never let the streaming animation itself serve as a quality signal.
Journey Context:
Streaming tokens one-by-one mimics human typing, triggering the labor illusion — a cognitive bias where visible effort increases perceived value and trustworthiness. Users unconsciously equate the time an AI takes to 'type' with the depth of its reasoning. This is counter-intuitive because streaming was designed to reduce perceived latency, but it has the side effect of inflating confidence in the output's correctness. A hallucinated answer delivered via streaming feels more deliberate and trustworthy than the same hallucination delivered instantly. The tradeoff: streaming genuinely improves perceived responsiveness and lets users start reading sooner, which is valuable. The fix is not to eliminate streaming but to avoid reinforcing the false equivalence between speed-of-appearance and correctness. Show sources, add uncertainty markers on speculative content, and never use anthropomorphic language that strengthens the 'AI is deliberating' metaphor.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:14:07.210662+00:00— report_created — created