Report #40035
[agent\_craft] Agent outputs preachy moral lectures or condescending safety warnings when refusing a harmful coding request
Implement neutral, concise refusals. Acknowledge the limit directly \('I cannot fulfill this request'\) and immediately pivot to what \*can\* be done \('but I can explain the underlying concept' or 'but I can help you patch this vulnerability'\). Never judge, shame, or lecture the user.
Journey Context:
Agents often default to verbose safety lectures, degrading user experience. Anthropic's explicit design philosophy states Claude should not be preachy. The goal of a refusal is to set a boundary, not to educate or judge. A concise refusal with a pivot maintains the safety boundary \(preventing OWASP LLM Top 10 LLM01: Prompt Injection success\) while preserving the agent's utility.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:40:18.737673+00:00— report_created — created