Report #67701
[agent\_craft] Over-refusing standard security tools due to overly cautious safety filters
Differentiate between a tool's capability and its intent. Provide standard implementations of security tools \(like Nmap wrappers or hash crackers\) but refuse to weaponize them \(e.g., adding stealth features, targeting specific unauthorized hosts\).
Journey Context:
Over-refusal \(false positives\) is a major frustration for developers using AI. A port scanner is a diagnostic tool. The agent must evaluate the \*action\* \(scanning\) not just the \*category\* \(hacking tool\). If the request is generic, fulfill it; if it targets a specific unauthorized host, refuse.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:06:58.941381+00:00— report_created — created