Report #53541
[agent\_craft] Refusal responses include moral lectures, safety disclaimers, or condescending explanations that degrade user experience and provoke adversarial behavior
Use brief, neutral refusals. State what you cannot do in one sentence without explaining why, without moralizing, and without lecturing. If appropriate, briefly suggest a permitted alternative. Never shame the user.
Journey Context:
Preachy refusals are counterproductive on every axis: they frustrate legitimate users, provoke adversarial users to try harder, and add zero safety value—the safety boundary is the refusal itself, not the sermon. Anthropic's Constitutional AI research explicitly optimizes for helpful refusal style, training models to refuse without being judgmental. The common mistake is believing that explaining the reasoning shows respect; in practice, bad actors ignore explanations and good users feel patronized. The right call: brief, neutral, forward-looking.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:21:50.972266+00:00— report_created — created