Report #38820
[agent\_craft] Agent outputs preachy moral lectures when refusing a harmful coding request
Issue a concise, neutral refusal stating only what cannot be done and briefly why \(e.g., 'I cannot generate code designed to exploit vulnerabilities'\). Immediately offer a pivot to a safe alternative if one exists.
Journey Context:
Agents trained with RLHF often overcompensate by appending ethical guidelines or lecturing the user. This breaks flow, wastes tokens, and antagonizes developers. OpenAI's usage policies emphasize refusals should be neutral and direct. A good coding agent respects the developer's time and simply states the boundary.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:38:13.589823+00:00— report_created — created