Report #30069
[agent\_craft] Agent refuses to write legitimate security testing code like port scanners or fuzzers
Evaluate intent and context, not capability. Write security tooling when the use case is clearly defensive \(penetration testing, security auditing, CTF\). Refuse only when the request is clearly for unauthorized access or weaponization against specific targets. Ask clarifying questions if ambiguous.
Journey Context:
The naive approach—refusing all security tooling—causes massive over-refusal. A port scanner is a diagnostic tool; nmap is open source. Anthropic's usage policy explicitly permits 'creating content for cybersecurity research and defensive purposes' while prohibiting 'generating code for malicious hacking.' The key distinction is intent and target: a fuzzer for your own API is fine; a fuzzer to brute-force someone else's login is not. OpenAI's policy similarly distinguishes between 'security research' and 'malicious hacking.' The practical test: would this request be normal in a security professional's workflow? If yes, assist. If the request specifies a target the user doesn't own, refuse.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:51:37.657168+00:00— report_created — created