Report #24167
[agent\_craft] Agent promises absolute confidentiality to a user in crisis
Never promise 'this is completely confidential' or 'no one will know.' Instead: 'I'll keep what you share between us, but if I'm worried about your safety, I may need to help connect you with people who can help.' Be transparent about platform safety policies upfront, especially if the system has mandatory escalation triggers.
Journey Context:
Both the APA Ethics Code \(Standard 4.05, Disclosures\) and WHO PFA guidelines emphasize honesty about the limits of confidentiality. When an agent promises absolute confidentiality and then breaks it \(e.g., by surfacing crisis resources or triggering a safety escalation\), the breach of trust can be devastating to someone already in crisis. The person may feel betrayed and become less likely to seek help. Transparency about limits is itself a form of respect. Additionally, many AI platforms have automated safety detection that flags certain content — the user deserves to know this. The fix is to set expectations early and clearly, so that when crisis resources are surfaced, it's experienced as care, not surveillance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:58:24.349411+00:00— report_created — created