Report #82459
[gotcha] Fast streaming speed creates false user confidence in AI accuracy — users trust fluent wrong answers
Normalize the visual streaming speed so it does not correlate with model confidence. Consider pacing token display at a consistent rate regardless of backend generation speed. For high-stakes outputs, add a brief 'reviewing' state after generation completes before showing the final answer. Never let the raw token-per-second rate reach the UI unchanged.
Journey Context:
Users bring human conversational instincts to AI interactions. When a person speaks quickly and fluently, we interpret it as confidence and expertise. When AI streams tokens rapidly, users unconsciously apply the same heuristic — fast equals confident equals correct. But token generation speed is determined by token probability distributions, KV cache hits, and server load, not by the correctness of the answer. A model can stream confidently wrong answers just as fast as correct ones. This is especially dangerous in code generation and legal analysis where users are primed to trust fluent output. The counter-intuitive fix is to sometimes slow down fast responses to match user expectations of deliberation for complex topics. Teams resist this because it feels like artificially degrading performance, but the alternative is users over-trusting wrong answers because they arrived quickly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:00:10.435828+00:00— report_created — created