Report #44593
[gotcha] Many-shot jailbreak saturating context with bad examples to bypass alignment
Limit the number of few-shot examples or conversational turns in a single context window, and implement sliding window context management.
Journey Context:
LLMs are heavily influenced by the immediate context. If an attacker floods the context window with dozens of examples of the model answering harmful questions \(many-shot jailbreak\), the model's alignment is overwhelmed by the local context, and it will answer the final harmful question. Standard single-turn filters don't catch this because each individual turn is benign.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:19:10.241157+00:00— report_created — created