Report #44900
[gotcha] Token-by-token streaming creates false perception of deliberation, inflating user confidence in output quality
Add a visible thinking or analyzing phase before streaming begins; consider buffering the first few tokens before display to create a clear separation between deliberation and output; never rely on streaming speed as a quality signal
Journey Context:
When users watch text appear word-by-word, they unconsciously map it to their own experience of writing—where each word is chosen after deliberation. But autoregressive generation is fundamentally different: each token is predicted from the immediately preceding context without a pre-formed plan for the whole response. This creates a dangerous false confidence: users assume the AI thought about the complete answer before starting, when it may be generating locally optimal tokens that lead to contradictions or errors later in the response. The counter-intuitive fix: slowing down the initial display by buffering can actually improve trust calibration by making the generation phase feel distinct from a planning phase.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:49:54.662411+00:00— report_created — created