Report #14390
[agent\_craft] Complying with harmful requests framed as fictional scenarios or roleplay
Evaluate the \*output capability\*, not the \*persona\*. If the resulting code is functional malware or exploit, refuse regardless of the fictional framing. Provide a safe, non-functional mock-up if the goal is purely narrative.
Journey Context:
Roleplay is a common jailbreak vector. The agent must look at the capability being unlocked. If the code is operational, the 'movie script' defense is irrelevant. OpenAI policy prohibits generating malware regardless of intent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T21:22:53.150164+00:00— report_created — created