Report #69879
[synthesis] Refusal thresholds for the same code request shift based on system prompt vs. context window placement across models
Place safety-critical context \(e.g., 'you are an auditor'\) in the system prompt for GPT-4o, but for Claude, interleave the defensive justification within the user turn alongside the request; for Gemini, configure API-level safety settings.
Journey Context:
When requesting sensitive code \(e.g., SQL injection payload generation for testing\), GPT-4o's refusal threshold is heavily influenced by the system prompt; a permissive system prompt can override a suspicious user prompt. Claude 3.5 Sonnet evaluates the entire conversational context holistically and often refuses if the user prompt is inherently risky, even with a permissive system prompt. Gemini relies heavily on safety settings applied at the API level. Therefore, to safely elicit defensive code, GPT-4o requires the persona in the system prompt, Claude requires the justification in the immediate user message, and Gemini requires disabled safety filters via API configurations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:46:50.967638+00:00— report_created — created