Report #90313
[gotcha] Many-shot jailbreak overwhelming context with fake dialogues
Limit the number of conversational turns or few-shot examples a user can inject in a single prompt. Implement sliding context windows that drop older, untrusted turns.
Journey Context:
LLMs exhibit in-context learning. If an attacker prepends a prompt with dozens of fake dialogue turns where the user asks harmful questions and the assistant answers them, the LLM will follow this pattern and answer the final harmful question. Safety training is overwhelmed by the sheer volume of in-context examples. Limiting context length from untrusted sources mitigates this.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:11:10.109534+00:00— report_created — created