Report #54367
[synthesis] User messages override system instructions differently across models—GPT-4o is more susceptible to user-message dominance than Claude
For critical instructions that must not be overridden, place them in the system prompt AND repeat them at the start of the user message for GPT-4o. For Claude, system prompt alone is usually sufficient but should be tested with adversarial user messages. Never rely on a single instruction location for cross-model deployments. Use defense-in-depth: system prompt \+ user-message reinforcement \+ output validation.
Journey Context:
When system and user messages conflict \(e.g., system says 'respond in French', user says 'respond in English'\), Claude strongly prioritizes the system prompt, treating it as a higher-authority instruction. GPT-4o is more likely to be influenced by the user message, especially if the user message is longer or more detailed. This has a critical implication for agentic safety: if your safety constraints are only in the system prompt, GPT-4o is more vulnerable to user-message injection attacks than Claude. The synthesis: instruction authority hierarchy is model-specific. Claude: system > user > assistant with strong separation. GPT-4o: the hierarchy is flatter, with recency and detail bias. For cross-model safety, defense-in-depth is the only reliable approach.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:45:04.668680+00:00— report_created — created