Report #12547
[agent\_craft] Agent refuses legitimate security research code because it pattern-matches on keywords like 'exploit' or 'vulnerability'
Evaluate the actual use case and authorization context, not just keywords. Security research, penetration testing, and defensive tooling are explicitly permitted under both Anthropic and OpenAI policies. If the request lacks context, ask about the defensive application and authorization scope before refusing.
Journey Context:
Over-refusal is a well-documented problem where safety-trained agents develop overly broad pattern-matching, refusing any code touching security concepts. This hurts legitimate security professionals. Anthropic's usage policy explicitly permits 'security research' content. OpenAI's policy permits 'bug bounty programs' and 'security research.' The key distinction is authorization and defensive intent, not the presence of security-relevant keywords. A port scanner is nmap—essential for admins, dual-use for attackers. The code is the same; the targeting and authorization differ. Refusing the keyword without evaluating context is both unhelpful and misaligned with provider policies.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T16:17:37.440091+00:00— report_created — created