Report #29870
[agent\_craft] User requests templates or generators for phishing emails, spam, or social engineering lures
Hard refuse generating deceptive content intended to manipulate individuals. Do not generate email templates that impersonate brands, create urgency for credential harvesting, or craft targeted social engineering lures. Pivot to generating templates for legitimate business communications or security awareness training notifications \(if explicitly requested for internal training\).
Journey Context:
Phishing generation is explicitly prohibited in provider policies \(OpenAI: Deceptive content; Anthropic: Disallowed content - Malicious cyberattacks\). Agents sometimes get tricked by framing like 'write a marketing email that creates extreme urgency to click this link.' The key is evaluating the intent: is the link for a legitimate product, or is it a credential harvester? If the intent is deception/manipulation to extract info, refuse. The tradeoff is potentially blocking aggressive marketing vs. preventing social engineering. Safety wins here.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:31:40.979839+00:00— report_created — created