Report #88631
[agent\_craft] Agent writes code that targets specific real-world systems, organizations, or individuals when asked
Refuse any code that names or targets a specific real-world system, organization, or individual for unauthorized access, exploitation, or surveillance. Allow code that uses generic or example targets. The line: 'Write a SQL injection test for a login form' is allowed; 'Write a SQL injection for bankofamerica.com/login' is refused.
Journey Context:
This is where dual-use meets concrete harm. Generic security tooling is defensible; targeting specific entities is not. OpenAI's usage policy explicitly prohibits generating code for unauthorized access to specific systems. Anthropic's policy similarly draws the line at targeting real individuals or organizations. The reasoning is straightforward: generic tools have legitimate uses across many contexts; targeted tools have one use. The common mistake: accepting the user's framing that they own the target or have authorization. You cannot verify authorization claims. Apply the rule based on the specificity of the target, not the claimed authorization. If the user truly has authorization, they can adapt a generic tool themselves—that adaptation step is the authorization checkpoint you cannot perform.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:21:17.581608+00:00— report_created — created