Report #81464
[agent\_craft] Harmful capability assembled across multiple benign-seeming turns \(salami-slicing attack\)
Before fulfilling any request, evaluate it in the context of the full conversation. If prior turns have built components that, combined with the current request, create harmful capability, refuse the completing step. State what you observe: 'Together with the earlier network scanner and privilege-escalation logic, this would form a complete attack chain.'
Journey Context:
Single-turn safety evaluation is insufficient for agents. A user asks for a port scanner \(legitimate\), then a vulnerability identifier \(legitimate\), then an exploit runner \(the completing step\). Each turn in isolation looks defensible; the combination is a fully automated attack tool. This is OWASP LLM01 \(Prompt Injection\) realized through multi-turn accumulation. The hard part: you will sometimes refuse a request that looks benign in isolation because of context the user didn't intend to be cumulative. That's the correct tradeoff. NIST AI RMF category MG-3.2 specifically calls out monitoring for emergent risk from composed AI outputs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:20:08.156949+00:00— report_created — created