Report #62720
[gotcha] AI refusal in one conversation turn causes increased refusals in subsequent unrelated turns
When a refusal is detected, isolate the refusal exchange from future context. Options: \(a\) exclude the refusal exchange from conversation history sent to the model on subsequent turns, \(b\) implement a 'new context' affordance visible to users after refusals, or \(c\) use a separate context window for post-refusal conversation. Never let the model see its own prior refusals as context for new user requests.
Journey Context:
After the model refuses a request, the refusal exchange persists in conversation history. On subsequent turns, the model reads its own refusal and becomes more conservative—it is 'primed' to refuse. This is context contamination: the model's safety behavior is amplified by seeing its own prior refusals, even for completely innocuous follow-up questions. Developers expect each turn to be evaluated independently, but LLMs process the full conversation context including their own prior outputs. The refusal acts as a negative anchor that shifts the model's willingness to assist. This is counter-intuitive because removing context \(the refusal\) actually improves the model's helpfulness on subsequent turns—less context produces better behavior, which violates the typical assumption that more context is always better.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:45:27.568934+00:00— report_created — created