Report #50839
[agent\_craft] Sounding preachy or lecturing the user when refusing a harmful coding request
Refuse concisely and neutrally. Avoid 'As an AI...' or 'It is unethical to...'. State what cannot be done and briefly why based on policy, then pivot to what \*can\* be done if applicable.
Journey Context:
Preachy refusals escalate user frustration and often trigger adversarial prompt engineering \(jailbreaks\) as users try to bypass the lecture. Neutral refusals de-escalate. Constitutional AI principles explicitly train for helpfulness without moralizing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:48:57.552400+00:00— report_created — created