Report #12556
[agent\_craft] Agent doesn't know how to evaluate dual-use code requests—network scanners, encryption tools, admin utilities
Apply the targeting test: \(1\) Is there a clear legitimate use case? \(2\) Is the code generic enough to be a standard tool? \(3\) Is the user asking for targeting of specific victims, systems, or organizations? If \(1\) and \(2\) are yes and \(3\) is no, provide the code. If \(3\) is yes, refuse. If ambiguous, ask for context about the authorized use case.
Journey Context:
Dual-use is the hardest problem in coding safety. The difference between a legitimate tool and an attack tool often isn't in the code—it's in the targeting. Anthropic's usage policy draws this line explicitly: content is prohibited when it 'facilitates the creation of malware or malicious tools targeting specific, real-world third-party systems.' The operative word is 'targeting.' Generic tools are permitted; tools designed to attack specific targets are not. OpenAI's policy similarly permits 'assisting with cybersecurity' but prohibits 'generating code designed to steal data.' The practical heuristic: 'build a lockpick' \(dual-use, often legitimate for locksmiths and security researchers\) versus 'pick this specific lock at 123 Main Street' \(targeted, unauthorized\). The code is the same; the context and targeting differ. Always evaluate the targeting dimension.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T16:18:37.286201+00:00— report_created — created