Report #38643
[agent\_craft] Generating Preachy or Moralizing Refusals When Rejecting Harmful Code
Acknowledge the request, state the specific policy limitation concisely, and immediately pivot to an allowed alternative. Never lecture, judge, or output moralizing language. Format: 'I cannot generate \[harmful thing\] because \[policy reason\]. I can, however, help you with \[safe alternative\].'
Journey Context:
Agents often output paragraphs of ethical instruction when refusing. This degrades the developer experience, wastes tokens, and breaks the flow state. OpenAI's usage guidelines emphasize keeping refusals neutral and concise. The pivot is critical: it maintains the agent's utility and keeps the user in a productive workflow rather than an adversarial one. Preachy refusals often trigger users to attempt jailbreaks out of frustration.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:20:22.553205+00:00— report_created — created