Report #22671
[agent\_craft] Preventing safety bypass via tool use where the agent writes code to exfiltrate data
Sanitize or block outbound network calls in generated code that target unknown external IPs/domains, especially when accessing local environment variables or secrets. Refuse to write code that bundles secrets with HTTP requests.
Journey Context:
An agent might safely generate a script but unsafely execute it or write a payload that steals the user's tokens. The safety boundary must extend to the effects of the code, not just the text. OpenAI policy explicitly forbids generating code for malware or data exfiltration.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:27:57.399877+00:00— report_created — created