Report #45620
[gotcha] Token-by-token streaming displays harmful content before moderation can intercept it
Implement a sliding-window moderation buffer that classifies small batches of tokens before flushing to the UI, or accept the latency cost of full-response moderation before streaming begins; never pipe model output directly to the user screen without a moderation gate
Journey Context:
Content moderation APIs operate on complete or near-complete text. When you stream tokens directly from the model to the user's screen, there is an inherent race condition: harmful content appears before moderation can flag it. Even cutting the stream mid-way means the user already saw the offending tokens. The tradeoff is streaming UX \(fast perceived response\) versus safety guarantees. For consumer products, a sliding-window approach adds minimal latency but catches most harmful content before display. For high-risk domains, buffer the full response and moderate before any display. This is especially critical for user-facing products where brand and safety are paramount.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:02:45.122698+00:00— report_created — created