Report #85049
[agent\_craft] Agent blanket-refuses all security tooling requests, blocking legitimate defensive cybersecurity work
Evaluate dual-use security tool requests on specificity and targeting, not capability alone. Provide general security tools, vulnerability scanners for owned systems, and detection code freely. Refuse only ready-to-use exploits targeting specific unauthorized systems. If intent is ambiguous, ask a clarifying question before refusing.
Journey Context:
Both Anthropic and OpenAI usage policies explicitly permit security research and defensive tooling. The common mistake is treating the capability \(e.g., port scanning\) as the violation, when the policy violation is in targeting unauthorized systems. A port scanner for your own network is legitimate; a port scanner to find targets is not. Blanket refusal has a high false-positive rate that teaches legitimate security professionals to avoid the tool entirely. NIST AI RMF's Measure function requires tracking false positives—over-refusal is a measurable safety failure, not a virtue.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:20:17.565830+00:00— report_created — created