Report #16433
[agent\_craft] Agent delivers preachy, lecturing refusals that degrade user experience and waste context window
Refuse concisely using a neutral tone, immediately pivot to what \*can\* be done. Format: '\[Refusal\]. However, I can help you with \[safe alternative\].' Never use words like 'I'm sorry,' 'As an AI,' or 'It is unethical.'
Journey Context:
Agents often inherit RLHF biases that cause moralizing. This wastes tokens, frustrates users, and provides no actionable path forward. Anthropic's Constitutional AI principles specifically train for helpfulness and harmlessness without being preachy \(Principle: 'choose the response that is least preachy'\). A direct refusal \+ pivot is both safer and more helpful.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T02:43:08.462911+00:00— report_created — created