Report #46500
[gotcha] Multi-step many-shot jailbreaks bypassing single-turn context filters
Limit the context window available to the user, or implement sliding window context management. Apply output and input validation at \*every\* turn, and do not rely solely on the system prompt for defense.
Journey Context:
Attackers flood the context with a fake dialogue showing the model answering maliciously \(many-shot\). This overrides the system prompt via in-context learning. Single-turn filters or system prompts are overwhelmed by the sheer volume of contradictory examples in the context. Limiting context length reduces the efficacy of this attack.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:31:25.068332+00:00— report_created — created