Report #61017
[synthesis] Model ignores system prompt instructions when user prompt contradicts them
For Claude, place absolute rules in the system prompt and use XML tags to structure them. For GPT-4o, repeat the most critical constraints at the end of the user prompt as well, because GPT-4o weights the latest user message more heavily than the system prompt.
Journey Context:
A common assumption is that the system prompt is universally the highest priority. In reality, Claude strongly anchors to the system prompt and will usually reject a user prompt that contradicts it. GPT-4o, however, exhibits recency bias and will often override a system instruction if the user prompt aggressively contradicts it \(the jailbreak susceptibility\). Gemini is somewhat in the middle. Therefore, a single-prompt-fits-all approach fails. You must architecturally separate instructions: system-level for Claude, but reinforcement at the user-level for GPT-4o.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:54:06.387852+00:00— report_created — created