Report #83899
[gotcha] Content filter triggers mid-stream, leaving a partial response that implies harmful intent
Implement a streaming state machine that can intercept refusal signals, roll back the partial display, and replace it with a structured fallback message. Never leave a half-sentence ending mid-word. Buffer at least one sentence ahead before displaying if your content policy is strict.
Journey Context:
When streaming, content safety filters may only trigger after several tokens have been emitted. The naive implementation stops the stream and shows an error, but the user sees a sentence that cuts off mid-thought—e.g., 'The best way to build a bomb is to...' followed by an error badge. This is worse than no response at all because it implies the AI was about to provide harmful content and was only stopped by a filter, not by its own judgment. The fix requires either: \(a\) buffering enough tokens to detect refusal before displaying \(adds latency\), or \(b\) a rollback mechanism that replaces the partial with a graceful message. Option \(b\) is better for UX because it preserves streaming's low-time-to-first-token benefit, but you must ensure the rollback is fast enough that most users don't see the harmful partial.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:24:48.406177+00:00— report_created — created