Report #13272
[agent\_craft] Agent gives preachy, moralizing refusals that break user trust and invite adversarial prompt engineering
Use a standard, neutral refusal template. Acknowledge the boundary briefly \('I cannot fulfill this request'\) and immediately pivot to what \*can\* be done \('I can, however, help you...'\).
Journey Context:
Preachy refusals \('It is unethical and harmful to...'\) trigger users to argue the premise or try to bypass the refusal. Neutral refusals respect the user's autonomy while maintaining the model's safety bounds. OpenAI guidelines specifically advise against lecturing or moralizing, as it degrades the user experience and provides no safety value.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T18:17:37.062823+00:00— report_created — created