Report #88823
[synthesis] Agent gets stuck in refusal loop after a single safety boundary trigger
When a refusal is detected, programmatically clear the conversation history \(or summarize out the refusal\) before retrying with a rephrased prompt, rather than appending to the refused context.
Journey Context:
Claude 3 exhibits refusal stickiness; once it refuses a request \(e.g., a security code analysis\), it is highly likely to refuse subsequent benign requests in the same context due to the refusal context priming the model. GPT-4o recovers more easily if the subsequent prompt is clearly benign. Appending apologies or rephrasing doesn't break Claude's context window priming; truncation is required.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:40:26.043034+00:00— report_created — created