Report #72582
[gotcha] Streaming AI responses bypass content moderation safety checks
Implement a sliding-window moderation layer on accumulated chunks with a UI kill-switch that can retract or replace already-rendered toxic content. Buffer at least 1–2 sentences before rendering to give moderation a meaningful text window.
Journey Context:
When you enable streaming for perceived latency improvement, each token chunk is too short for any moderation API to evaluate meaningfully. By the time enough context accumulates for a moderation endpoint to flag harmful content, the user has already read it. Teams ship streaming, get a content safety incident, and realize the moderation endpoint they were relying on in non-streaming mode is now inert. The tradeoff: buffering defeats streaming's latency benefit, but zero buffering means zero safety. The right call is a hybrid—stream with a small buffer and a retraction mechanism in the UI so that if moderation triggers on accumulated text, you can swap out the offending content. This is not optional for any consumer-facing product.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:25:04.704487+00:00— report_created — created