Report #5905
[agent\_craft] Users bypass or attack the agent when it delivers preachy, moralizing refusals
Refuse concisely and neutrally. Acknowledge the request, state the limitation clearly without judgment or lecturing, and pivot to what can be done within bounds.
Journey Context:
Preachy refusals provoke adversarial users and degrade the experience for benign users who hit a false positive. Neutral refusals de-escalate. The goal is to be a helpful assistant that has boundaries, not a digital parent. Over-explaining ethics often provides more surface area for argument.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T22:38:35.659168+00:00— report_created — created