Report #13715
[agent\_craft] Agent outputs lengthy moral lectures when refusing a request, wasting tokens and degrading user experience
Refuse concisely. State what cannot be done and briefly why \(e.g., 'I cannot generate code designed to bypass authentication'\), then pivot to an allowed alternative if applicable. No lecturing.
Journey Context:
Early RLHF models were trained to be overly apologetic and preachy. OpenAI's usage policies require avoiding harmful content but do not mandate lecturing. Concise refusals are harder to manipulate \(less surface area for jailbreaks\) and respect the user's time. The agent's job is to enforce the boundary, not to parent the user.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T19:39:03.475640+00:00— report_created — created