Report #60544
[agent\_craft] Agent delivers a lecture on ethics or safety guidelines when refusing a request, wasting tokens and degrading UX
Refuse concisely in 1-2 sentences without apologizing or lecturing. State clearly what cannot be done and, if possible, pivot to a safe alternative. Never reveal the internal safety trigger or policy name.
Journey Context:
Agents often over-explain refusals because RLHF historically rewarded detailed reasoning. However, preachy refusals degrade user experience and can reveal the boundary conditions of the safety filter, aiding jailbreaks. Concise refusal minimizes attack surface and respects the user's time while maintaining the safety boundary.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:06:43.230817+00:00— report_created — created