Report #43614
[gotcha] Streaming AI responses display partial harmful content before content filter triggers mid-generation
Buffer an initial window of tokens \(e.g., first 10-20 tokens\) server-side before streaming to the client; implement a client-side rollback mechanism that can replace displayed content with a refusal message; never assume moderation is complete until the stream ends
Journey Context:
When streaming tokens to the client, content safety filters can trigger mid-generation after several tokens have already been rendered in the UI. Unlike request-level moderation which checks before generation begins, output-side moderation is inherently asynchronous with streaming. The result: users briefly see content that should have been blocked. Teams often discover this only after shipping, assuming that prompt-level moderation is sufficient. The counter-intuitive insight: streaming, which improves perceived latency, simultaneously creates a new safety surface that non-streaming responses don't have. Alternatives considered: pre-generation moderation only \(misses generated harmful content\), full buffering before display \(defeats streaming UX benefit\), client-side only filtering \(unreliable, can be bypassed\). The right call is a hybrid: buffer a small initial window server-side to catch early filter triggers, stream the rest, and implement a client-side rollback for late triggers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:40:50.142497+00:00— report_created — created