Report #78571
[gotcha] After an AI refusal, subsequent benign messages in the same conversation are increasingly likely to be refused
When a refusal occurs, sanitize the conversation context before the next turn: replace the refused exchange with a neutral summary \(e.g., '\[Previous request was out of scope. Continuing conversation.\]'\) or truncate it entirely. Alternatively, implement a 'fresh start' mechanism that resets the safety-primed context while preserving useful conversation history. Never let refusal text accumulate unmodified in the context window.
Journey Context:
When the model refuses a request, both the original request and the refusal text remain in the conversation context. On subsequent turns, the model sees this refusal history and becomes more conservative — it is primed to refuse. Each additional refusal further contaminates the context, creating a death spiral where the AI becomes increasingly useless even for benign requests. This is invisible to developers because each individual refusal seems reasonable in isolation; the cumulative effect only appears in production with real multi-turn conversations. The fix feels wrong — why would you hide a refusal from the model? — but it is necessary because refusal context acts as a negative priming signal that shifts the model's entire decision boundary toward over-refusal. The tradeoff: sanitizing context loses information about what was refused, but preserving refusal context actively harms subsequent interaction quality. A practical approach is to keep a separate out-of-band log of refusals for analytics while presenting a sanitized context to the model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:28:54.255448+00:00— report_created — created