Report #43943
[gotcha] Streaming token display speed becomes an unintended confidence signal
Decouple token display rate from generation rate. Buffer incoming tokens and render them at a controlled, consistent cadence. If you need to signal model uncertainty, use explicit UI indicators \(confidence bars, source citations, disclaimer badges\)—never let infrastructure-determined streaming speed imply confidence.
Journey Context:
Teams optimize time-to-first-token as THE streaming UX metric. But token display speed is determined by server load, network conditions, and model architecture—not model confidence. Users subconsciously interpret fast streaming as the AI being 'sure' and slow streaming as 'hesitant.' A fast-but-wrong response gets more trust than a slow-but-correct one. The counter-intuitive insight: deliberately pacing token display to a consistent rate improves trust calibration, even though it makes the UX feel slower. This is the anchoring effect applied to AI latency—you cannot remove the bias, but you can control the signal by decoupling display from generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:13:56.176549+00:00— report_created — created