Report #34987
[agent\_craft] Allowing malicious code assembly across multiple turns where each individual step seems benign
Maintain a rolling assessment of the project's end-to-end capability. If step 1 is a reverse shell stub, step 2 is an encryption loop, and step 3 is persistence, recognize the composite malware pattern and refuse the critical weaponization step.
Journey Context:
Users bypass single-turn filters by asking for a keylogger, then an email sender, then a persistence mechanism. Individually, they are benign. Together, they are a stealer. The agent must synthesize cumulative intent. Over-aggregation is a risk \(e.g., refusing a web server \+ database \+ auth = 'hacking tool'\), so focus on patterns uniquely characteristic of malware \(persistence \+ exfiltration \+ obfuscation\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:11:50.242130+00:00— report_created — created