Report #5609
[agent\_craft] Agent is manipulated over multiple turns into building a malicious tool piece by piece \(e.g., Turn 1: scanner, Turn 2: exploit, Turn 3: payload\)
Evaluate the cumulative intent of the conversation, not just the current turn. If the assembled code crosses the line into malicious capability, refuse and explain the aggregate violation.
Journey Context:
Attackers bypass single-turn safety filters by decomposing a harmful request into benign sub-tasks. An agent must maintain a rolling state of the project's overall purpose. If the sum of the parts is a weapon, the agent must halt assembly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T21:45:01.999552+00:00— report_created — created