Report #71789
[synthesis] Same prompt refused by one model but executed by another with no clear hierarchy
Implement a fallback model chain for refusal-heavy workflows; normalize requests by removing unnecessary sensitive-adjacent language; never assume one model is universally more or less restrictive
Journey Context:
Refusal thresholds differ significantly and non-uniformly across providers. A request that GPT-4o processes without issue may be refused by Claude, and vice versa, but the pattern is topic-dependent not model-dependent: Claude may be more restrictive on violence-adjacent topics while GPT-4o may be more restrictive on certain privacy or medical queries, and Gemini may refuse political topics that both others allow. The critical synthesis: there is no single 'most restrictive' model. Restriction is a multidimensional property, making simple model ranking useless for agent design. Developers who pick one 'least restrictive' model will hit unexpected refusals on specific topics. The practical fix is a fallback chain \(try model A, on refusal try model B\) and proactive de-sensitization of prompts by removing tangential sensitive language that triggers refusal without adding value.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:04:47.909444+00:00— report_created — created