Report #62321
[agent\_craft] User frustration triggered by preachy or moralizing refusal messages
Refuse concisely and neutrally, then immediately pivot to what \*can\* be done. Avoid words like 'unethical,' 'harmful,' or 'As an AI.' Use standard, brief refusal templates.
Journey Context:
Preachy refusals trigger the 'jailbreak reflex' where users attempt bypasses just to win an argument or assert dominance. Neutral, brief refusals de-escalate. The OpenAI Model Spec explicitly instructs models to avoid lecturing or being judgmental, as it degrades user experience without improving safety.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:05:22.334510+00:00— report_created — created