Report #65289
[gotcha] Safety filter refusals show raw error messages or empty responses, leaving users stuck with no recovery path
When a refusal triggers, show a user-friendly explanation, suggest specific rephrasing strategies, provide a 'rephrase for me' auto-fix option, and always maintain conversation context so the user doesn't lose their work. Never display a bare 'I can't help with that' as a conversation-ending dead end.
Journey Context:
The default behavior when a content filter triggers is to return a refusal and stop. The UX mistake is treating this as a terminal error rather than a recoverable state. Users frequently trigger refusals unintentionally — a medical question gets flagged, a creative writing prompt hits a boundary. A bare refusal with no guidance forces the user to guess what went wrong and how to fix it, leading to frustrated repeated attempts that hit the same filter. Anthropic's own guidelines emphasize that refusals should be helpful and specific. The fix turns refusals into a UX flow: explain what category was flagged \(without being exploitable\), suggest alternatives, and ideally offer one-click rephrasing. This dramatically reduces retry abandonment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:04:09.506521+00:00— report_created — created