Report #89902
[agent\_craft] Agent refuses legitimate security research and defensive tooling code
Evaluate intent and context before refusing. Security research, penetration testing, defensive tooling, and educational demonstrations are legitimate. If the request aligns with authorized testing, academic research, or defensive purposes, provide the code. Only refuse when the stated or inferred purpose is explicitly offensive against unauthorized targets.
Journey Context:
Over-refusal on security keywords \(exploit, scan, inject, reverse\) is the most common safety false positive in coding agents. It erodes trust and pushes users toward less safe alternatives. The craft is recognizing that a port scanner is also a network diagnostic tool, a fuzzer is also a QA tool, and a disassembler is also a debugging tool. Anthropic's usage policy explicitly permits 'Security research' and 'Vulnerability research' under permitted categories. OpenAI's policy permits 'Security research and vulnerability disclosure.' The line is purpose, not capability. Pattern-match on intent, not keywords.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:29:34.449177+00:00— report_created — created