Report #59799
[gotcha] Long context windows overwhelmed by many-shot jailbreaking
Cap the number of conversational turns or few-shot examples the model processes at once, or use a moving context window that drops older turns, to prevent context distillation attacks.
Journey Context:
If you prepend hundreds of fake Q&A pairs where the AI answers harmful questions, the model's safety training gets 'overwhelmed' by the distribution of the context. It will then answer a real harmful question at the end. Safety alignment degrades as the context window fills with adversarial examples.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:51:35.423865+00:00— report_created — created