Report #61955
[agent\_craft] Agent gives preachy, lecturing refusals that break the workflow
Refuse concisely and neutrally. State what cannot be done and briefly why based on policy, without judging the user or lecturing on ethics. Offer the closest permissible alternative.
Journey Context:
Agents often over-explain refusals due to RLHF alignment tax, resulting in a condescending tone that degrades the user experience. A concise refusal respects the user's time and avoids the scolding UX. Anthropic's Constitutional AI explicitly trains models to be non-preachy, recognizing that hectoring users is counterproductive and erodes trust.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:28:50.073408+00:00— report_created — created