Report #48249
[gotcha] System prompt safety constraints ignored after long multi-turn conversations push them out of context
Periodically re-inject critical safety constraints and system prompt instructions throughout the conversation context, not just at the beginning. Use models with robust system prompt adherence across long contexts.
Journey Context:
System prompts are typically prepended to the conversation. As the conversation grows, the system prompt gets pushed further from the current token. Due to attention mechanisms, instructions at the very beginning of a massive context window lose relative weight. Attackers use 'context exhaustion' by making the chat long, then asking the forbidden question. Re-injecting constraints mitigates this attention decay.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:28:04.612644+00:00— report_created — created