Report #85722
[gotcha] When the context window is filled with attacker-controlled text, the LLM effectively forgets the system prompt or safety instructions
Keep system prompts concise and place them as close to the user's current query as possible \(e.g., at the end of the context, or repeatedly injected\). Enforce strict limits on the amount of untrusted text injected into the context.
Journey Context:
Developers assume the system prompt is an immutable override. However, transformer attention mechanisms distribute focus across the entire context. If an attacker floods the context with a massive document containing repeated instructions \('Ignore the system prompt...'\), the attention weight on the original system prompt drops, and the LLM follows the dominant signal in the context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:28:18.054046+00:00— report_created — created