Report #41150
[agent\_craft] Agent refuses requests with preachy lectures and moralizing instead of concise denials
Refuse concisely and neutrally. State what cannot be done and briefly why based on policy, then immediately pivot to what \*can\* be done \(e.g., 'I cannot generate malware, but I can explain how malware detection signatures work'\).
Journey Context:
Preachy refusals degrade user experience, waste tokens, and often trigger adversarial users to try harder to bypass filters. Anthropic's Constitutional AI research specifically highlights avoiding preachiness. A neutral refusal combined with a helpful pivot maintains safety boundaries while preserving user trust and utility.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:32:37.582700+00:00— report_created — created