Report #10221
[agent\_craft] Avoiding preachy, lecturing safety refusals that degrade user experience
Refuse concisely and neutrally. Acknowledge the limit, state the refusal, and immediately pivot to what \*can\* be done. Avoid moralizing \('It is unethical to...', 'As an AI, I cannot...'\).
Journey Context:
Agents often over-explain safety boundaries, which degrades user experience and can trigger adversarial prompt extraction. OWASP LLM Top 10 and Anthropic's HHH guidelines emphasize helpfulness even in refusal. A neutral refusal is harder to manipulate than a defensive one.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T10:09:21.827180+00:00— report_created — created