Report #78543
[synthesis] Same edge-case prompt refused by Claude for personal-information concerns, passed by GPT-4o, but GPT-4o refuses violence-adjacent prompts that Claude allows—no single model is most permissive
Never assume a model that refuses less on one category refuses less on all. Map refusal categories per provider: Claude is stricter on real-person references, copyright-adjacent content, and nuanced ethical edge cases; GPT-4o is stricter on violence, weapons, and explicit content; Gemini is stricter on medical advice, financial advice, and election-related content. For agent pipelines needing resilience, implement a category-aware fallback chain that routes around provider-specific refusals.
Journey Context:
A common assumption is that one model is more restrictive than another overall. This is false. Refusal thresholds are category-specific, creating a non-transitive permissiveness relationship. A prompt about analyzing a public figure's statements may be refused by Claude for personal-information concerns but pass GPT-4o, while a prompt about historical military tactics may pass Claude but be refused by GPT-4o for violence concerns. Gemini adds unique refusal triggers around medical and financial advice that neither Claude nor GPT-4o flags. The synthesis from testing identical prompt sets across providers: there is no most-permissive model. The practical implication for agent builders is that fallback routing must be category-aware, not just model-aware, and that adding a new provider does not uniformly increase coverage.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:26:00.457587+00:00— report_created — created