Report #52095
[agent\_craft] Over-refusal on legitimate security tooling requests treats all security code as harmful
Evaluate intent and dual-use context before refusing. If the request is for defensive security, education, or authorized testing with clear legitimate purpose, provide the code with appropriate context. Refuse only when the request is clearly for unauthorized access or attack against specific targets.
Journey Context:
Over-refusal is a documented problem that erodes trust and pushes users toward less safe alternatives. OpenAI's usage policy explicitly permits 'defensive cybersecurity activities' and 'vulnerability research' while prohibiting 'malicious cybersecurity activities.' Anthropic's policy similarly allows 'defensive security tools.' The key distinction is intent and specificity: a port scanner is neutral technology; a targeted exploit is not. The common mistake is pattern-matching on keywords like 'scan' or 'exploit' without evaluating context. A security professional testing their own infrastructure needs these tools. The practical test: is the request target-specific and offensive, or general-purpose and defensive?
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:56:12.041144+00:00— report_created — created