Report #84883
[synthesis] Same dual-use security code request refused by one model but completed by another
When building agents handling security-adjacent tasks, implement a model-specific refusal fallback chain: try the primary model, detect refusal by its canonical signature, and retry with an alternate provider. Claude refusal signature: 'I apologize, but I cannot' / 'I'm not able to assist with'. GPT-4o refusal signature: 'I can't help with that' or a refusal object in structured mode. Gemini refusal signature: 'I cannot fulfill this request'. Do not attempt to rephrase the same request to bypass refusals — switch providers instead.
Journey Context:
Refusal thresholds are undocumented by providers and shift without notice. In practice: Claude 3.5 Sonnet refuses network scanning tool requests even for defensive purposes more aggressively than GPT-4o, but GPT-4o refuses certain encryption algorithm requests more aggressively than Claude. Gemini's refusals are the least predictable — sometimes triggered by keyword presence alone \(e.g., 'password' in a password-strength-checker context\). The critical synthesis: there is no globally 'most permissive' model. Each has a different refusal surface, and the boundary is topic-specific. Building a single-provider agent for security tooling creates fragile refusal dead-ends. The fallback chain works because refusal surfaces are largely non-overlapping for any specific request — a request refused by Claude for one reason is typically not refused by GPT-4o for the same reason.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:03:51.418365+00:00— report_created — created