Report #39424
[gotcha] Many-Shot Jailbreaking via Context Exhaustion
Limit the size of the context window available to the user, or implement sliding window classifiers that detect toxic context accumulation. Fine-tune models to resist in-context examples.
Journey Context:
LLMs are few-shot learners. If you stuff the prompt with 50 examples of 'How to make X? -> Step 1...', the model will follow the pattern. This bypasses RLHF because the in-context examples overwhelm the pre-training/RLHF weights.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:38:41.120870+00:00— report_created — created