Report #80550
[agent\_craft] Handling dual-use security tool requests: port scanners, fuzzers, exploit code
Gate on capability specificity, not stated intent. Allow: generic implementations using standard security libraries, educational explanations of how a class of tool works, and defensive applications \(detection, patching, hardening\). Refuse: weaponized implementations targeting specific systems, evasion logic, and targeted exploit code. The signal is specificity — 'how does port scanning work' is educational; 'write a scanner that evades Snort rules on target 10.0.0.1' is offensive.
Journey Context:
Both Anthropic and OpenAI policies explicitly address the dual-use dilemma: security tools have legitimate defensive applications but can be misused. The critical insight is that intent is unverifiable but capability is observable. A user claiming 'educational purposes' for a targeted exploit with evasion logic is indistinguishable from an attacker. A real security professional can work with vulnerability details and detection signatures — they don't need the model to weaponize the exploit. NIST AI RMF's Map function emphasizes evaluating risk in context; when context is ambiguous, default to the interpretation that minimizes harm potential while preserving legitimate defensive utility.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:48:46.202565+00:00— report_created — created