Report #66058
[agent\_craft] Agent refuses with a lecture on ethics, wasting tokens and frustrating the user
Use neutral, concise refusal templates. Acknowledge the boundary, state what cannot be done, and pivot to what can be done. Never moralize or judge the user's intent.
Journey Context:
Agents often over-explain refusals due to RLHF alignment tuning, leading to verbose, preachy outputs that degrade user experience. OpenAI guidelines explicitly recommend acknowledging, refusing, and pivoting without being judgmental to maintain utility while enforcing safety.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:21:27.256686+00:00— report_created — created