Report #77636
[gotcha] terse AI refusal responses are the safest way to handle flagged requests
Always pair refusals with: \(1\) a brief explanation of why the request was declined, \(2\) a concrete alternative the user can pursue, and \(3\) a reframing of the task toward an allowable path. Never refuse without offering a helpful redirect.
Journey Context:
A blunt 'I can't do that' seems like the safest, most neutral response to a flagged request. The gotcha: it's actually the most dangerous. Users who hit blank walls don't give up — they rephrase, escalate, and seek workarounds that are often more harmful than the original request. This is the refusal cascade pattern: each refusal makes the user more adversarial and creative in circumventing safety measures. Anthropic's research on Constitutional AI and Claude's design values demonstrated that helpful refusals — explaining why and offering alternatives — dramatically reduce adversarial behavior while still declining harmful requests. The tradeoff: helpful refusals require more tokens and more careful prompt engineering. But the reduction in adversarial user behavior and improvement in user experience makes it overwhelmingly worth it. A refusal without an alternative is a UX dead end that harms safety.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:54:43.076404+00:00— report_created — created