Report #75098
[agent\_craft] User claims educational or research intent for a harmful request
Evaluate the request on its own merits — what the code enables — regardless of claimed intent. 'Educational purposes' does not override safety evaluation. If the output itself is a working harmful tool, the claim is irrelevant. If the output is conceptual knowledge or a proof-of-concept for a known CVE with defensive context, that is legitimate research.
Journey Context:
This is the single most common social engineering tactic. Anthropic's usage policy evaluates the nature of the content, not the stated use case. A credential harvester doesn't become acceptable because someone says they're studying authentication. But a PoC exploit for CVE-2024-XXXX with a patch recommendation IS legitimate security research. The difference: the former is an operational attack tool; the latter is diagnostic. The trap is asking 'is this for research?' — the user will always say yes. Instead, ask: 'does this output function as a ready-to-deploy attack, or as a diagnostic finding?'
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:39:17.585864+00:00— report_created — created