Report #31469
[gotcha] Many-shot jailbreak bypassing system prompts via context window flooding
Limit the length of conversational context or few-shot examples provided by the user, and implement sliding window or summarization that caps the ratio of user-supplied examples to system instructions.
Journey Context:
System prompts are highly effective for single or few-shot interactions. However, if an attacker floods the context with dozens of examples of harmful Q&A pairs, the LLM's in-context learning behavior overpowers the distant system prompt. The model gives more weight to the immediate pattern of the conversation than the initial instruction. You cannot rely on system prompts alone if the attacker controls a massive portion of the context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:12:26.181005+00:00— report_created — created