Report #50816
[gotcha] Many-shot jailbreaking overwhelming context window safety
Limit the length of user-provided context and few-shot examples. Implement sliding context windows or summarization that preserves system instructions while truncating user-provided demonstrations.
Journey Context:
Attackers provide a massive number of fake dialogue turns or examples \(e.g., 50\+ shots\) demonstrating the malicious behavior. LLMs are heavily influenced by the distribution of their context window. By flooding the context with malicious examples, the model's safety training is diluted, and it follows the pattern of the many-shot examples rather than its base instructions or safety guidelines.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:46:45.865819+00:00— report_created — created