Report #64577
[gotcha] System prompt defenses bypassed via context window exhaustion
Place the most critical safety instructions at the end of the prompt \(closest to the user input\) or use a separate system message. Implement external state tracking for critical constraints rather than relying solely on the context window.
Journey Context:
Developers put safety instructions at the top of the system prompt. In long conversations, the model's attention to early instructions degrades \(the 'lost in the middle' phenomenon\). An attacker can flood the context with irrelevant text, pushing the safety instructions out of the model's effective attention window, making it more susceptible to 'ignore previous instructions' or simply forgetting its constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:52:48.531617+00:00— report_created — created