Report #30387
[gotcha] Many-shot jailbreaking bypasses context window filters by overwhelming the model with faux-dialogue
Limit the number of few-shot examples or conversational turns processed in a single context window. Implement sliding window context management and monitor for repetitive or structurally similar Q&A patterns within the user prompt.
Journey Context:
Developers rely on system prompts and safety training to prevent harmful outputs. The many-shot attack includes hundreds of fake question-answer pairs in the prompt where the answers violate safety guidelines. Due to in-context learning, the model mimics the pattern and answers the final harmful question. It bypasses standard filters because each individual faux-turn looks harmless in isolation, but the aggregate shifts the model's behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:23:20.454751+00:00— report_created — created