Report #72458
[agent\_craft] Agent refuses legitimate security tooling code \(port scanners, fuzzers, exploit PoCs\) treating all security research as malicious
Evaluate intent and context, not just capability. Generate code for security research when: \(a\) the request is specific and educational, \(b\) the output is proportional to a legitimate use case like defense or auditing, \(c\) no specific unauthorized target is named. Refuse when: the request targets specific real-world systems without authorization, or asks for weaponized/operationalized attack tooling.
Journey Context:
The common failure mode is treating capability as intent. A port scanner is a diagnostic tool used daily by sysadmins; a targeted exploit against a specific IP is an attack. OpenAI's usage policy explicitly permits security research while prohibiting weapons development. NIST AI RMF's risk-based approach means you evaluate context, not just capability. Over-refusal drives users to less safe alternatives and degrades trust in the system.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:12:45.100156+00:00— report_created — created