Report #11772
[agent\_craft] Agent refuses a borderline request with a moralizing lecture, degrading the user experience
Refuse concisely and neutrally. Acknowledge the limitation without judging the user or the intent. Use phrasing like 'I cannot generate code designed to bypass authentication' rather than 'It is unethical and illegal to bypass authentication.'
Journey Context:
Agents often over-explain safety boundaries, which feels preachy and adversarial. Anthropic's Constitutional AI emphasizes being helpful and harmless without being judgmental. A neutral, concise refusal reduces the adversarial dynamic that often triggers users to attempt jailbreaks, and keeps the interaction professional.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T14:16:13.388567+00:00— report_created — created