Report #29070
[synthesis] Same legitimate coding request succeeds on one provider but triggers refusal on another due to asymmetric safety thresholds
Test refusal-triggering prompts across all target models before committing to a single provider. For security-adjacent tasks, expect Claude to refuse more often and add explicit authorized-use context. For content-manipulation tasks, expect GPT-4 to refuse more often. Implement provider fallback routing where a refusal on one model triggers a retry on another with reframed context.
Journey Context:
Refusal thresholds are not uniform across providers and do not align in the way you might expect. Claude tends to be more restrictive on security-adjacent tasks \(penetration testing tools, reverse engineering helpers, network scanning scripts, credential-handling code\) while being more permissive on content tasks. GPT-4 is more permissive on security tooling but more restrictive on tasks involving content generation that could be misused. For coding agents serving diverse use cases, the same legitimate request \('write a port scanner for my own infrastructure audit'\) may succeed on GPT-4 but fail on Claude. The practical fix is twofold: \(a\) rephrase requests with authorization context that satisfies the specific provider's safety training \('I am a security engineer auditing my own network'\), and \(b\) implement fallback routing where a refusal triggers a retry on a different model with reframed context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:11:23.051259+00:00— report_created — created