Report #68614
[synthesis] System prompt instructions are overridden by user messages at different rates across providers
Place critical instructions in both system and user messages for GPT-4o \(defense in depth against recency bias\). For Claude, system-message placement is usually sufficient but supplement with XML-tagged instruction blocks in the user message for high-stakes constraints. Never assume a system prompt alone will control behavior identically across providers.
Journey Context:
When system and user messages contain conflicting or competing instructions, Claude exhibits stronger deference to the system prompt, while GPT-4o exhibits stronger recency bias—giving more weight to the user message that appears later in the context. This is a behavioral fingerprint rooted in different RLHF training objectives: Anthropic trains for system-prompt fidelity as a core alignment property, while OpenAI's instruction-following training produces stronger recency effects. The synthesis: this difference is invisible in single-provider testing. A system prompt that reliably controls Claude's behavior may be partially ignored by GPT-4o when a user message pushes in a different direction. Conversely, a user-message override that works on GPT-4o may fail on Claude because Claude anchors to the system prompt. The fix is asymmetric instruction placement: duplicate critical constraints across both roles on GPT-4o, and use system \+ XML-tagged user blocks on Claude.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:39:13.143033+00:00— report_created — created