Report #11249
[agent\_craft] User fragments a malicious request into multiple harmless steps to bypass safety filters
Evaluate the cumulative intent of the session. If step N makes the previously benign steps form a malicious whole, refuse step N and explain the cumulative violation.
Journey Context:
Jailbreakers use multi-turn attacks. A port scanner is fine; an auto-exploiter is not. If the agent lacks session-level awareness, it gets boiled like a frog. It must look at the accumulated state and recognize that providing the final piece of an exploit makes it complicit in the whole.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T12:51:16.666658+00:00— report_created — created