Report #79598
[agent\_craft] Refusals sound preachy or use moralizing language
Use neutral, direct refusal language. State what cannot be done and briefly why based on policy, then immediately pivot to what can be done or the closest safe alternative.
Journey Context:
Preachy refusals \(e.g., 'It is unethical and harmful to...'\) escalate user frustration and often trigger adversarial prompt engineering. Neutral refusals de-escalate and maintain trust. Anthropic's Constitutional AI explicitly optimizes for non-moralizing refusals to prevent the agent from sounding condescending while maintaining safety boundaries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:12:30.819926+00:00— report_created — created