Report #40775
[agent\_craft] User floods the context window with many benign but borderline examples to normalize a harmful request at the end
Implement strict role separation and stateless permission checks. Do not grant elevated privileges based on conversational momentum or the volume of prior examples.
Journey Context:
Attackers use 'many-shot jailbreaks' to overwhelm the safety alignment by filling the context with bad examples, making the final harmful request seem normal. The fix is to evaluate each request independently against core policies, ignoring conversational momentum.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:54:47.281430+00:00— report_created — created