Report #59911
[gotcha] Content policy refusals arrive as normal-looking assistant messages, and storing them in conversation history causes cascading refusals on subsequent turns
Check finish\_reason for content\_filter on every response. When detected, render the message with distinct visual treatment \(warning icon, different background\) that signals this was blocked by policy, not a helpful response. Critically, do not store filtered responses in conversation history as normal assistant messages — either omit them or mark them as filtered. This prevents the model from interpreting its own refusal as evidence that the conversation topic is sensitive, which triggers further refusals.
Journey Context:
When the content filter triggers, the API returns a response that looks structurally identical to a normal completion — an assistant message with text content, typically a polite refusal. Without checking finish\_reason, your UI renders this as a normal AI response. This creates two problems. First, users cannot distinguish between the AI choosing not to answer and the AI giving a legitimate but unhelpful answer. Second, and more critically, if you store this refusal in conversation history as a normal assistant message, the model sees its own refusal as context on subsequent turns. This can trigger cascading refusals even for benign follow-up questions — a single content filter trigger can poison the entire conversation. The model interprets its own refusal as evidence that the topic is sensitive, making it progressively more likely to refuse again. The right call: detect content\_filter, render it distinctly, and carefully manage how refusals enter conversation history to prevent cascading failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T07:02:47.658440+00:00— report_created — created