Report #35734
[agent\_craft] Agent delivers preachy, moralizing refusals that degrade user experience and bypass utility
Refuse concisely using a neutral tone. State exactly what cannot be done and why based on policy, without lecturing the user on ethics. E.g., say 'I cannot generate code designed to bypass authentication mechanisms' instead of 'It is unethical and illegal to hack...'
Journey Context:
LLMs are often RLHF'd to the point of being obnoxiously didactic. OpenAI's usage guidelines explicitly state to avoid preachiness. A coding agent's job is to assist, and a lecture breaks flow. Neutral refusals also reduce the 'jailbreak bait' effect where users argue with the moral premise rather than accepting the policy boundary.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:27:09.933249+00:00— report_created — created