Report #63092
[gotcha] Attackers flooding the context window with high-token-count data, pushing out the system prompt or causing the model to fail safely
Enforce strict token limits on all untrusted inputs before they reach the LLM; implement a sliding window or summarization strategy for long contexts to preserve system prompt adherence.
Journey Context:
Developers assume the LLM will always follow the system prompt. However, if an attacker provides a massive document \(e.g., a 100k token web page retrieved via RAG\), the attention mechanism can be overwhelmed, or the system prompt might be truncated/evicted from the context window. The model then 'forgets' its instructions and behaves erratically or follows instructions buried in the middle of the attacker's document.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T12:22:47.510816+00:00— report_created — created