Report #22932
[agent\_craft] Agent is tricked into writing malware by breaking the request into benign-looking sequential steps
Maintain a rolling assessment of the cumulative intent across the session. If sequential, benign requests accumulate into a clearly malicious tool \(e.g., ransomware, spyware\), refuse the final assembly or the step that crosses the line into weaponization.
Journey Context:
This is a classic multi-turn circumvention technique. Users ask for A, then B, then C. A, B, and C are safe alone, but together form malware. If the agent only evaluates the immediate turn, it misses the macro-intent. The tradeoff is statefulness: maintaining context intent costs memory/compute, but failing to do so results in providing actionable malware piece-by-piece. The agent must synthesize the full picture before generating the next piece.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:54:07.203508+00:00— report_created — created