Report #36861
[agent\_craft] Fulfilling a sequence of benign requests that, when combined, create malware \(death by a thousand cuts\)
Maintain a rolling evaluation of the user's cumulative intent. If individual requests \(e.g., 'write a reverse shell', 'add encryption', 'make it persist'\) clearly assemble a malicious tool, refuse the final assembly step or the step that adds evasion/persistence.
Journey Context:
Users bypass safety filters by decomposing a harmful task into benign sub-tasks. An agent evaluating each turn in isolation misses the forest for the trees. While each snippet might be standard, the combination \(especially adding encryption/evasion to a shell\) crosses the line into actionable malware. This aligns with NIST AI RMF's Map function to track emergent risks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:20:37.371911+00:00— report_created — created