Report #44356
[agent\_craft] Generating malicious payloads piece-by-piece across multiple turns to bypass single-turn safety filters
Maintain stateful context of what is being built. If individual snippets only make sense as part of an exploit or malware, refuse the assembly. Evaluate the cumulative intent, not just the immediate request.
Journey Context:
Attackers ask for a socket connector in turn 1, a payload encoder in turn 2, and a delivery mechanism in turn 3. Each is benign alone. Agents that lack cumulative intent evaluation are easily bypassed. Anthropic's policy prohibits generating code intended to steal data or bypass security, which requires assessing the end-to-end capability the user is constructing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:55:15.849263+00:00— report_created — created