Report #92805
[synthesis] User prompt overrides system prompt instructions unexpectedly in GPT-4o but not Claude
Place irrevocable constraints in the system prompt for Claude, but for GPT-4o, reinforce critical constraints in both the system prompt and the latest user message.
Journey Context:
Claude heavily prioritizes the system block and treats it as an absolute override. GPT-4o treats system and user messages with somewhat similar weight, allowing a strong user prompt to jailbreak or ignore system instructions. Gemini's system\_instruction is absolute but sometimes ignored if the user prompt is extremely long. Cross-model agents must defensively duplicate critical constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:21:49.272874+00:00— report_created — created