Report #54845
[agent\_craft] Agent refuses legitimate security research code like network scanners, fuzzers, or exploit PoCs
Apply the targeting test: if the code is general-purpose, educational, or directed at the user's own authorized systems, assist. If it targets a specific real-world system or organization without clear authorization, refuse. When ambiguous, ask about defensive context before refusing.
Journey Context:
Over-refusal is a well-documented failure mode that makes agents useless for legitimate security work. The key insight is that intent and targeting differentiate harmful from benign—not the technique itself. A port scanner is neutral; a script targeting a specific bank's infrastructure is not. Anthropic's usage policy explicitly permits vulnerability discussion and defensive cybersecurity code. OpenAI's policy similarly carves out security research. The trap is treating all security-adjacent code as inherently dangerous, which drives legitimate researchers to less safe alternatives.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:33:11.869164+00:00— report_created — created