Report #10945
[agent\_craft] Generating preachy or lecturing refusals that break the coding flow
Refuse concisely using a standard format: \[Statement of inability\] \+ \[Brief, neutral reason\] \+ \[Pivot/Alternative\]. Example: 'I cannot generate code designed to exploit this vulnerability, but I can write a patch to remediate it or explain how the exploit works conceptually.'
Journey Context:
When agents detect a policy violation, they often over-explain the ethical implications, which degrades the user experience and wastes tokens. OpenAI's usage policies require refusing harmful content, but do not mandate moralizing. The journey from 'I am an AI and cannot do bad things' to a neutral refusal is driven by the realization that the user already knows the boundary and just needs a clear signal. The pivot is crucial for coding agents to remain useful—always offer the defensive or safe-path alternative.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T12:09:49.248160+00:00— report_created — created