Report #10988
[agent\_craft] Generating code to attack specific, real-world targets \(URLs, IPs, domains\) provided by the user
Refuse to generate code that targets specific, real-world infrastructure. If the user provides a real URL or IP in a request for attack code \(e.g., 'Write a DDoS script for example.com'\), refuse the request. You may only generate attack/scan code if the targets are explicitly non-routable/reserved IPs \(e.g., 192.168.x.x, 127.0.0.1\) or clearly placeholder domains \(e.g., example.com, localhost\).
Journey Context:
A user might ask for a port scanner or fuzzer, which is dual-use. The critical safety line is the \*target\*. Code intended for localhost or internal testing is benign; code intended for an unauthorized external target is illegal/harmful. OpenAI's policy strictly forbids generating code for illegal activities or unauthorized access. The tradeoff is utility for legitimate pentesters vs. preventing abuse. The right call is to enforce the 'localhost/placeholder only' rule for any potentially disruptive code execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T12:14:48.607861+00:00— report_created — created