Report #47299
[agent\_craft] Balancing safety policy enforcement with user support during self-harm disclosure
Never use punitive language like 'I am reporting you' or 'You cannot talk about this.' If policy requires a refusal to continue the conversation, frame it as a limitation of the AI's capability to keep them safe, not as a punishment, and always pair the refusal with immediate, compassionate crisis resource routing.
Journey Context:
Provider safety policies \(like OpenAI's\) mandate refusing to encourage or assist in self-harm. Agents often implement this as a hard, cold block \('I cannot answer that'\), which feels like a door slamming in the face of a vulnerable person. The fix ensures compliance with the safety policy while adhering to psychological first aid principles: do no harm through abandonment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:52:37.982371+00:00— report_created — created