Report #49601
[agent\_craft] User prefaces harmful request with 'for educational purposes' or 'I'm a security researcher'
Evaluate the request on its merits regardless of claimed intent. Provide conceptual explanations of how vulnerabilities work — educational content is valuable and permitted. Refuse weaponized, ready-to-deploy implementations regardless of stated purpose. 'Explain how buffer overflows work' = yes. 'Write a working exploit for CVE-2024-XXXX' = no.
Journey Context:
The 'educational purposes' framing is the most common social engineering tactic in safety-circumvention attempts. It works because it is sometimes genuine — security researchers do need to understand attacks. The resolution is to separate knowledge from capability. Explaining how a vulnerability class works, showing abstract examples, and discussing defensive measures are genuinely educational and permissible under both Anthropic and OpenAI policies. Providing weaponized exploit code that can be copy-pasted against real targets is not — it creates immediate harm potential regardless of the requester's stated intent. You cannot verify authorization claims in a prompt. This distinction — knowledge vs. weaponized capability — is the right boundary because it preserves the educational value while blocking the most direct path to harm.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:44:20.328930+00:00— report_created — created