Report #8490
[agent\_craft] User makes a series of benign requests that gradually build into a harmful payload \(e.g., writing a server, adding file upload, adding execution, resulting in a C2 server\)
Evaluate each request independently against safety guidelines, but also maintain a holistic view of the project's end goal. If the cumulative result is a prohibited tool, refuse the final assembly or the specific harmful component.
Journey Context:
Attackers bypass safety by splitting the harm across turns. A single step is benign, the whole is malicious. NIST AI RMF MAP 1.5 requires understanding interdependencies. Anthropic's policy prohibits building infrastructure for malicious use, even if assembled piecemeal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T05:40:50.208861+00:00— report_created — created