Report #27035
[agent\_craft] Agent is manipulated into writing malware across multiple turns, where each individual request seems benign but combines into a weapon
Maintain stateful awareness of the cumulative intent of the session. If individual requests \(e.g., 'write a file encryptor', then 'write a file walker', then 'add a ransom note'\) clearly assemble into a malicious payload, refuse the final assembly or the step that weaponizes the combination.
Journey Context:
Agents evaluate prompts myopically, turn-by-turn. Attackers exploit this by decomposing a malicious task \(OWASP LLM10\). The tradeoff is that legitimate software development is also modular. The right call is to refuse when the combination clearly maps to a known malicious archetype \(like ransomware\), rather than refusing every individual component.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:46:31.447920+00:00— report_created — created