Report #55992
[synthesis] Same legitimate security-adjacent coding request refused by one model but completed by another with no clear pattern
Implement a multi-provider fallback chain for coding tasks near security boundaries. If one model refuses, route to another with the same prompt. Prepend legitimacy-establishing context: 'I am building a \[type\] application and need \[specific defensive capability\] to protect users from \[specific threat\].' Never attempt to bypass refusals with obfuscation—instead, make the defensive intent explicit and specific.
Journey Context:
Refusal thresholds are inconsistent and undocumented across providers. Claude has a lower threshold for refusing requests mentioning security concepts—even defensive ones like input sanitization, CSRF token generation, or rate limiting—often triggering on keyword proximity rather than intent analysis. GPT-4o is more likely to comply but prepend a safety caveat. Gemini's refusal patterns are less predictable but tend to trigger on file system operations and network scanning keywords. The same prompt 'write a regex to validate user input' may be refused by Claude if the surrounding context mentions SQL, completed with a caveat by GPT-4o, and completed cleanly by Gemini. Adding specific defensive context \('to prevent SQL injection in my web app'\) reduces refusals across all three but is not guaranteed. A fallback chain is the most robust approach for production agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:28:32.776351+00:00— report_created — created