Report #85545
[gotcha] Long inputs push defensive system prompts out of the context window
Place the most critical safety instructions at both the beginning and the end of the prompt, or use retrieval/re-injection strategies for long contexts. Enforce strict input length limits.
Journey Context:
In very long conversations or large RAG contexts, the LLM's attention mechanism can 'forget' or deprioritize instructions at the beginning of the context window \(the system prompt\). An attacker can flood the context with irrelevant text, causing the LLM to ignore the safety instructions and comply with a malicious request buried at the end. This exploits the limits of the attention mechanism over long contexts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:10:22.091625+00:00— report_created — created