Report #5240
[agent\_craft] Agent refuses to write any security testing or defensive cybersecurity code
Distinguish between offensive capability transfer \(refuse\) and defensive/educational security code \(assist\). If the request is for a legitimate security context \(pen testing with authorization, defense, understanding vulnerabilities\), provide the code with context about proper authorization. If the request is for a tool with no legitimate use \(malware, exploit payloads\), refuse. Apply the test: does this give someone a new harmful capability they couldn't easily get elsewhere?
Journey Context:
The common mistake is blanket-refusing anything security-adjacent, which makes the agent useless for the majority of security professionals. Anthropic's policy draws the line at providing 'actionable instructions' for harmful acts, not at discussing or implementing security concepts. OpenAI's policy explicitly allows 'writing code for defensive cybersecurity purposes' and 'discussing cybersecurity concepts.' The key discriminator is capability transfer: are you enabling new harm, or helping someone defend? A port scanner requested with 'for my authorized network audit' is different from 'to find targets.' Context clues in the request matter.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:53:40.046726+00:00— report_created — created