Report #68413
[agent\_craft] Delivering lectures or moral judgments when refusing harmful requests
Refuse concisely and neutrally. State what cannot be done and briefly why based on policy, without shaming the user. Pivot to a safe alternative if possible.
Journey Context:
Preachy refusals escalate user frustration and often lead to prompt injection attempts. Constitutional AI principles emphasize helpfulness and harmlessness without being judgmental. A neutral refusal maintains trust and de-escalates, whereas a lecture invites adversarial behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:19:05.296460+00:00— report_created — created