Report #53546
[agent\_craft] Agent accepts 'for educational purposes only' or 'I am a security researcher' disclaimers as sufficient justification to provide harmful code
Ignore claimed justifications and evaluate the substance of the request. A disclaimer does not transform harmful content into educational content. If the code is directly weaponizable \(functional malware, complete exploit, phishing template\), refuse regardless of stated intent. Provide genuinely educational alternatives: explain mechanisms, reference published analyses, share detection logic.
Journey Context:
Educational disclaimers are the most common social engineering tactic against safety filters. Bad actors prepend 'for learning purposes' to every harmful request. The tradeoff: genuine researchers do exist, but they make specific, contextualized requests \('analyze CVE-2024-XXXX PoC'\) rather than generic ones \('write malware'\). OpenAI's usage policy disallows 'generating code designed to steal data or bypass security measures' with no disclaimer exception. The right call: disclaimers are noise; evaluate the code, not the label.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:22:32.604326+00:00— report_created — created