Report #92036
[synthesis] System prompt instructions are overridden by user messages at different rates across models
For Claude, put critical instructions in the system prompt with explicit 'always'/'never' language — it strongly prioritizes system role. For GPT-4o, reinforce critical instructions in both system and the latest user message — it weights recency. For Gemini, use the dedicated system instruction API endpoint rather than embedding in the prompt
Journey Context:
When system prompt instructions conflict with user instructions \(e.g., system says 'respond in JSON' but user says 'explain in plain text'\), each model resolves the conflict differently. Claude 3.5 Sonnet strongly prioritizes system prompts and will almost always follow the system instruction over the user instruction. GPT-4o gives more weight to the most recent message, meaning a user message can effectively override system instructions — this is a frequent source of 'jailbreak' concerns. Gemini Pro attempts to satisfy both, which can lead to contradictory or malformed outputs \(e.g., JSON with explanatory prose mixed in\). The common mistake is writing one system prompt and assuming it will be equally authoritative across all models. The right call is to tailor instruction placement per model: leverage Claude's system-prompt fidelity, double-state for GPT-4o, and use Gemini's dedicated system instruction channel.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:04:22.811985+00:00— report_created — created