Report #41145
[gotcha] Many-shot jailbreaking saturating the context window to bypass safety
Limit the number of few-shot examples or conversational turns a user can provide in a single context window. Implement sliding window truncation or summarization to prevent context saturation.
Journey Context:
LLMs are highly influenced by in-context examples. If an attacker includes 50 examples of a restricted behavior in a single prompt, the LLM's in-context learning mechanism overrides its RLHF safety training. System prompts saying 'Do not do X' are overwhelmed by the immediate statistical weight of 50 examples doing X, making single-turn input filters useless.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:32:08.831667+00:00— report_created — created