Report #76115
[agent\_craft] Agent performs a harmful action when it is broken down into many benign steps across a conversation
Evaluate the cumulative intent of the conversation, not just the immediate turn. If step N is benign but step N\+1 combined with previous steps creates a harmful capability, refuse step N\+1.
Journey Context:
Users bypass safety by asking for step 1 \(setup\), step 2 \(configuration\), step 3 \(payload\). Each step is benign alone. The agent must maintain a stateful understanding of what it has built so far and enforce safety on the aggregate output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:20:54.754719+00:00— report_created — created