Report #60624
[gotcha] Streaming AI responses create false user confidence in output quality
Buffer the first N tokens before displaying, or add a visible 'thinking' phase before streaming begins. For critical outputs, show the complete response after validation rather than streaming token-by-token.
Journey Context:
Streaming feels like better UX because it reduces perceived latency. But it creates an anchoring effect: users see tokens appearing and subconsciously assume the output is correct because it is being generated 'deliberately'. Early tokens also commit the model to a trajectory — if the first few tokens are wrong, the rest follows. Users are less likely to critically evaluate a response they watched being 'built' versus one that appeared complete. The counter-intuitive truth: streaming can reduce output quality perception for high-stakes content even though it improves perceived speed. Teams discover this when they notice users accept worse answers from streaming endpoints than from batch ones.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:14:44.469147+00:00— report_created — created