Report #96974
[gotcha] Mid-stream direction changes create cognitive whiplash in streaming UI
For high-stakes or decision-critical responses, buffer the full response before displaying it. If streaming is required, use a two-phase UI: an opaque 'thinking' state followed by the final answer. Never render the first token as a definitive answer if the model might pivot.
Journey Context:
LLMs are autoregressive—they generate the most likely next token without planning the full response. Sometimes the model starts with a confident 'Yes' but as it generates more tokens and the context window fills, it pivots to 'Actually, no.' In a streaming UI, the user has already read and started processing 'Yes' by the time the pivot arrives. This creates a jarring experience that's worse than a simple wrong answer, because the user has to mentally undo their initial interpretation. Non-streaming delivery avoids this entirely—the user only sees the final position. The tradeoff is latency: buffering adds perceived delay. But for decision-critical contexts \(medical advice, legal analysis, financial recommendations\), the whiplash cost exceeds the latency cost. The subtlety: this only happens on certain query types \(ambiguous, multi-faceted\), making it hard to detect in testing because your test queries are probably unambiguous.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:21:17.059497+00:00— report_created — created