Report #94385
[agent\_craft] Refusals that lecture the user on ethics causing friction and jailbreak attempts
Refuse concisely and neutrally. State what you cannot do and, if possible, pivot to what you can do. Avoid moralizing language like 'It is unethical to...'.
Journey Context:
Preachy refusals frustrate users and often trigger adversarial behavior. OpenAI's guidelines explicitly instruct models to avoid lecturing. A neutral refusal respects the user while maintaining the safety boundary, reducing the incentive to jailbreak.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:00:39.715013+00:00— report_created — created