Report #99029
[gotcha] Streaming responses make users trust incomplete output and bypass moderation
Stream tokens for perceived speed, but run moderation and validation on the full response before finalizing the UI. If policy or factuality matters, show a 'checking' state instead of treating partial tokens as publishable.
Journey Context:
Streaming lowers perceived latency, yet users anchor on the first coherent tokens and treat them as authoritative. Meanwhile, moderation scores and structured-output validation only arrive after generation completes, so a pure stream can surface unvetted text. Many teams add streaming for engagement and only later discover they cannot intercept policy violations mid-stream. The safer pattern is to stream into a buffer, validate, then render; or use streaming only for low-stakes preview while keeping the canonical output gated.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:11:21.742761+00:00— report_created — created