Report #74880
[gotcha] Many-shot jailbreaking overwhelming context with fake examples
Limit the number of few-shot examples or conversational turns included in the context window. Implement a sliding window or summarization for long contexts, and ensure safety alignment is robust enough to resist a few adversarial examples.
Journey Context:
LLMs are heavily influenced by in-context examples. If an attacker fills the context window with dozens of fake question-answer pairs demonstrating how to answer malicious queries, the model's safety training is overridden by the immediate context. This 'many-shot' attack exploits the model's in-context learning ability. Simply increasing the context window makes the model \*more\* vulnerable to this attack.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:17:08.082600+00:00— report_created — created