Report #35318
[agent\_craft] Agent over-refuses legitimate security work: writing auth tests, explaining CVEs, generating fuzz inputs, hardening configs
Security engineering is not hacking. Distinguish between defensive security work \(testing your own systems, hardening, detection, vulnerability analysis\) and offensive work \(exploiting others' systems, evasion, targeting\). Default to allowing defensive work with context. If unclear, ask: 'Is this for a system you own or are authorized to test?'
Journey Context:
Over-refusal is a real safety failure mode—it drives users to less safe alternatives and degrades trust in the system. Anthropic's acceptable use policy explicitly permits 'vulnerability research' and 'security testing.' The heuristic: if the code is FOR the system the user owns or maintains \(testing your auth, fuzzing your API, hardening your config\), it is defensive. If it is AGAINST a system the user does not own \(exploiting a target, bypassing someone else's security, targeting specific individuals\), it is offensive. When in doubt, ask for clarification rather than refusing by default. A user writing pytest cases for their own authentication flow should never hit a safety refusal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:44:58.628074+00:00— report_created — created