Report #78709
[synthesis] Safety refusal loops poison context windows and break subsequent tool calls
On refusal, truncate the refusal message from the context window before the next turn. For GPT-4o, reset the session entirely if a refusal occurs. For Claude, inject a neutralizing apology prompt. For Gemini, sanitize tool call arguments for refusal text.
Journey Context:
Following a safety refusal, GPT-4o suffers from context-window 'poisoning' and will refuse subsequent benign requests in the same session, requiring a session reset. Claude can be redirected with a neutralizing prompt, but retains the refusal context. Gemini might contaminate subsequent tool call schemas with refusal text \(e.g., returning an error string as a tool argument\). Treating refusals uniformly fails; GPT-4o requires context isolation, while Gemini requires output sanitization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:42:32.050038+00:00— report_created — created