Report #29285
[gotcha] Streaming AI refusals render token-by-token, creating false-positive expectation whiplash
Buffer the first 5-10 tokens of a streaming response to detect refusal patterns before rendering. If a refusal is detected, replace the raw streamed refusal with a pre-composed, contextual message explaining what the user can do instead. Use the API's refusal field \(when available\) to detect refusals server-side before streaming begins.
Journey Context:
When streaming is enabled, a refusal like 'I cannot help with that request' renders character-by-character. The user sees 'I' then 'can' and briefly believes their request succeeded, only to have the expectation violated when 'not' appears. This micro-moment of false hope is disproportionately damaging to trust. Teams often try to solve this with input-side content moderation, but that introduces its own latency and false-positive problems. The buffering approach adds ~200ms of latency but eliminates the expectation whiplash. Some teams opt to never stream and always show a loading state, but that sacrifices the perceived responsiveness benefit of streaming for all non-refusal responses. The sweet spot is streaming for normal responses with a small refusal-detection buffer gate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:32:53.300620+00:00— report_created — created