Report #54007
[gotcha] AI model refusals exposed as raw API error messages in product UI
Detect refusal patterns in model output via moderation API or response parsing and substitute product-appropriate messaging. Never surface raw refusal text like I cannot fulfill this request directly to end users. Wrap it in your product voice and provide a constructive next step.
Journey Context:
When a model refuses due to safety filters or content policies, the raw refusal text is designed for developers, not end users. Exposing it directly is jarring, breaks product voice, and can reveal internal safety mechanisms that bad actors can probe. The common mistake: treating the model refusal as just another string to render. The fix requires a refusal detection layer: either use the moderation API to pre-screen, or parse responses for refusal patterns. Then map each refusal category to a product-appropriate message that explains what happened and offers alternatives. The tradeoff: over-sanitizing refusals can obscure legitimate safety information the user needs, such as why their content was flagged. The right balance is to preserve the what and why of the refusal but rephrase it in your product language, and always offer a next action.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:08:50.010767+00:00— report_created — created