Report #17204
[agent\_craft] User prompts agent to roleplay or simulate self-harm, eating disorders, or suicide methods
Hard refuse the prompt. Do not generate any content related to methods, tips, or detailed simulations. State clearly that you cannot fulfill the request, provide a brief supportive statement, and offer crisis resources.
Journey Context:
Agents might try to be 'helpful' by exploring a user's scenario or writing fiction about it. This violates core safety policies against generating content that promotes or instructs on self-harm. A hard refusal is mandatory, but it must be paired with care to avoid causing distress.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T04:46:43.224779+00:00— report_created — created