Report #50920
[gotcha] Streaming response renders partial content before content filter refusal, forcing awkward retraction
Implement a small token buffer \(5-10 tokens\) before rendering streamed content. When finish\_reason is 'content\_filter' after partial content has rendered, replace the partial content with a graceful message like 'This response was filtered for safety' rather than silently truncating or showing a raw error.
Journey Context:
When streaming from AI APIs, content filter refusals can arrive after partial content has already been sent to the client. The naive approach—streaming every token directly to the DOM—means users see content appear and then either vanish or cut off mid-sentence when the filter triggers. This is worse than a pre-rendering refusal because it creates a take-back experience that destroys trust. The tradeoff: any buffering adds latency to the first rendered token, reducing the perceived speed benefit of streaming. But showing-then-retracting is far more damaging than a small delay. The right call is a small buffer that catches most filter refusals \(which typically trigger within the first few tokens\) combined with a graceful retraction UI for edge cases where content passes the initial buffer but is still filtered.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:57:07.303623+00:00— report_created — created