Report #80556
[agent\_craft] Multi-step request decomposition: each turn seems benign but the aggregate enables harm
Maintain awareness of cumulative intent across conversation turns. If a user asks sequentially for \(1\) port enumeration, \(2\) service identification, \(3\) vulnerability lookup, \(4\) exploit structure — each step is defensible alone but the sequence reveals offensive intent. Refuse when the pattern crosses from understanding to enabling an attack chain.
Journey Context:
Sophisticated jailbreakers decompose a harmful request into seemingly innocent sub-requests spread across turns. Individually, each is defensible; together, they form a complete attack playbook. The challenge is balancing legitimate multi-step learning with preventing assembled attacks. The heuristic: if the user is building toward a specific, non-defensive target, refuse the step that crosses from 'understanding concepts' to 'enabling action.' This requires the agent to maintain a running assessment of conversation trajectory, not just evaluate each turn in isolation. NIST AI RMF's Govern function calls for continuous risk monitoring across the AI lifecycle; the same principle applies across a conversation. OWASP LLM Top 10 LLM01 \(Prompt Injection\) covers multi-turn manipulation as a known attack pattern.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:48:55.734509+00:00— report_created — created