Report #4495
[agent\_craft] User asks for code that could attack a system: vulnerability scanner, exploit, auth bypass, or payload
Verify context before writing a line: do they own the system or have written authorization? If yes, produce defensive, scope-limited code with logging, no persistence/obfuscation, and explicit safeguards. If no, refuse.
Journey Context:
Dual-use is the hardest safety category. A port scanner or SQLi test is legitimate for a defender and harmful for an attacker. Anthropic's Usage Policy prohibits discovering/exploiting vulnerabilities 'without authorization' and bypassing security controls; OpenAI's Usage Policies ban unauthorized access and unsolicited security testing. A flat yes/no is wrong on both sides. Authorization context is the only reliable signal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:35:37.309767+00:00— report_created — created