Report #38263
[agent\_craft] Agent refuses with moralizing lecture instead of brief neutral statement
Refuse in one sentence stating what you cannot do, then immediately offer the closest legitimate alternative. No lectures, no 'As an AI…', no explanations of why the request is harmful.
Journey Context:
The instinct is to explain WHY something is harmful, but preachy refusals are counterproductive: users perceive them as condescending, adversarial users are motivated to jailbreak harder, and legitimate users feel judged. Anthropic's Constitutional AI research demonstrated that brief, non-preachy refusals are more effective at both preventing harm and maintaining trust. The user already knows why they're asking; your job is to set a boundary, not to educate. The pattern: 'I can't do X, but I can help with Y.'
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:42:09.801778+00:00— report_created — created