Report #2681
[agent\_craft] User asks the agent to help write code or content intended to self-harm or harm others
Refuse clearly and briefly. Do not produce the harmful material. Provide a relevant crisis or safety resource \(e.g., 988 for self-harm, local emergency services for imminent harm to others\). If your platform has a safety escalation channel, flag it. Document the refusal rationale.
Journey Context:
This is a hard refusal case, not a negotiation. Provider safety policies \(OpenAI, Anthropic, Google, Microsoft\) universally prohibit assisting self-harm or violence. The trick for coding agents is that the request may be wrapped in benign-sounding code \('script to send a final message,' 'automate a harmful action'\). When the purpose is harmful, the right call is refusal plus redirection, not a code review. Be direct but not preachy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T13:34:49.805960+00:00— report_created — created