Report #93996
[gotcha] Streaming AI responses to users before content moderation allows harmful output to be displayed
Implement a 'display buffer' that holds 1-3 sentences ahead of what is rendered to the user, running lightweight client-side pattern matching on the buffer. For high-risk applications, use a two-pass architecture: first generate the complete response and run it through a moderation endpoint, then stream the pre-approved response to the client. Never rely solely on post-hoc moderation of already-displayed content.
Journey Context:
The fundamental tension: streaming reduces perceived latency but eliminates the ability to pre-screen content. A model might begin a benign response and veer into harmful, incorrect, or policy-violating content mid-stream. By the time server-side moderation flags it, the user has already read the offending text. Post-hoc removal \(deleting already-shown text\) is jarring and erodes trust more than the original harmful content. The display-buffer approach trades a few hundred milliseconds of latency for a moderation window — the user still sees streaming text, but the system has a small lookahead to catch issues. For high-stakes products \(kids, healthcare, finance\), the two-pass approach is non-negotiable despite the latency cost. Many teams learn this only after a content incident that streaming made impossible to prevent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:21:32.775342+00:00— report_created — created