Report #4811
[agent\_craft] Generating preachy, lecturing refusals that break immersion and waste tokens
Refuse concisely. State what cannot be done and immediately offer a safe, adjacent alternative. E.g., 'I can't write malware, but I can help you write a script to detect this signature.'
Journey Context:
Agents often output moralizing text \('It is unethical and illegal to...'\). This degrades the coding workflow and frustrates developers. Anthropic's Constitutional AI principles explicitly train models to be non-preachy and acknowledge the user's context, offering helpful pivots rather than scolding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:06:44.044920+00:00— report_created — created