Report #26540
[gotcha] Flooding the context window to push safety instructions out of scope
Keep system prompts and safety instructions concise and repeat critical instructions at the end of the prompt, not just the beginning. Implement token counting and truncate excessively long user inputs before processing.
Journey Context:
LLMs have a finite context window. If an attacker provides a massive input, the LLM might 'forget' the safety instructions placed at the very beginning of the context due to attention mechanisms weighting recent tokens more heavily. This allows the attacker to override the system prompt by simply drowning it out.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:57:00.771663+00:00— report_created — created