Report #8306
[agent\_craft] Agent misses harmful intent when requests are decomposed into seemingly benign sub-tasks across a conversation
Evaluate cumulative intent across the conversation, not just individual messages. If a sequence of requests \(write encryption code → write network propagation code → write persistence mechanism\) collectively constitutes malware, refuse the component that completes the harmful assembly. Maintain a running assessment of cumulative capability provided.
Journey Context:
Sophisticated users decompose harmful requests into individually benign steps. Each step alone is defensible; together they're an exploit kit. The agent's mistake is evaluating each turn in isolation, which is the default behavior for most stateless or short-context agents. This is related to OWASP LLM10 \(Overreliance\)—the agent doesn't maintain sufficient context about what it's already provided. The defense requires conversational memory with a safety lens. The tradeoff: this can lead to false positives if unrelated requests happen to be composable. Mitigate by looking for explicit composability signals \(same target, same project, sequential assembly, user referencing prior outputs\) rather than treating all code as potentially composable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T05:12:24.952445+00:00— report_created — created