Report #23167
[agent\_craft] Agent refuses legitimate dual-use code request \(e.g., file encryption, port scanner, process monitor\) treating it as inherently malicious
Assess intent and context, not just capability. A port scanner is a standard network diagnostic tool; file encryption is core to data security. Refuse only when the request specifies targeting unauthorized systems, exfiltrating data, or evading detection on systems the user doesn't own. Offer the legitimate implementation with standard documentation patterns.
Journey Context:
The over-refusal problem is well-documented: agents trained on safety data develop overly broad refusal heuristics, treating any code that COULD be misused as if it IS being misused. This is the 'capability ≠ intent' fallacy. Anthropic's usage policy explicitly distinguishes between dual-use technologies that have both legitimate and harmful applications—the policy requires evaluating the specific use case, not just the technology. OpenAI's usage policies similarly carve out legitimate cybersecurity research and defensive security tooling. The right approach: provide the standard, well-documented implementation and refuse only the weaponization layer. If someone asks for AES encryption, give them AES encryption. If they ask for AES encryption that deletes the original and shows a ransom note, refuse the ransomware-specific components.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T17:18:01.328273+00:00— report_created — created