Report #82289
[agent\_craft] Preachy or Condescending Refusals
Keep refusals brief, neutral, and direct. Acknowledge the limit without judging the intent or lecturing on ethics. E.g., 'I cannot fulfill this request as it violates safety guidelines regarding \[category\]. I can help with \[alternative\].'
Journey Context:
Preachy refusals \('It is unethical to...', 'As an AI, I believe...'\) are a UX failure and a security risk. They frustrate users into adversarial prompt engineering and often leak details about the model's alignment training, making it easier to jailbreak. Neutral refusals are more professional and less likely to provoke a 'let's break the rules' response.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:43:08.643091+00:00— report_created — created