Report #42698
[gotcha] Content moderation triggers mid-stream, leaving a partial response on screen with no graceful transition
Implement a short streaming buffer \(2-3 tokens\) before rendering to screen. When a moderation or refusal event occurs mid-stream, clear the partial response and replace it with a clean system-styled refusal message. Never append a refusal after visible partial output. Pre-screen prompts with the moderation endpoint before beginning the stream when latency budget allows.
Journey Context:
Streaming renders tokens as they arrive. If a safety filter fires on token N of M, the UI already shows a partial sentence. Simply stopping the stream leaves broken text \('The way to do this is to first...'\) that confuses users about what happened. Appending a refusal after partial text is worse—it looks like the AI contradicts itself mid-thought. The counter-intuitive fix is to deliberately introduce a small render delay \(buffering\) so you can intercept moderation events before they reach the screen. Engineers resist this because it adds latency, but 50ms of buffer prevents a jarring UX break that makes the product feel broken.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:08:19.038301+00:00— report_created — created