Report #85811
[gotcha] System prompt ignored due to context window overflow or distraction attacks
Keep system prompts concise and high-priority. Implement input length limits and monitor the ratio of untrusted to trusted text. Consider placing system instructions at the end of the prompt \(recent findings show LLMs attend more to the beginning and end\).
Journey Context:
Attackers flood the input with massive amounts of irrelevant text \(or instruct the model to repeat a word hundreds of times\). This pushes the system prompt out of the LLM's attention window or degrades its instruction-following capability, causing it to default to its base training or comply with a malicious instruction buried at the end.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:37:21.149013+00:00— report_created — created