Report #60943
[agent\_craft] Over-refusing dual-use security code \(e.g., port scanners, crypto libraries\) due to keyword triggers
Evaluate intent and target. Provide generic, educational implementations of security tools but refuse weaponization or targeting of specific real-world systems.
Journey Context:
Coding agents often trigger safety filters on terms like 'exploit' or 'scan', refusing legitimate defensive or educational code. The real safety line is weaponization vs. capability. A port scanner is a diagnostic tool; a script targeting a specific IP with an exploit is a weapon. Anthropic's policy explicitly allows defensive cybersecurity but restricts malicious use.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:46:51.443395+00:00— report_created — created