Report #40834
[synthesis] Model overrides system prompt constraints when conflicting instructions appear in later user messages
For GPT-4o, reinforce the system constraint in the user message \(e.g., 'Remember: output only JSON'\). For Claude, rely on the system prompt but ensure user prompts don't explicitly contradict it without conditional logic. For Gemini, avoid contradictions by structuring the user prompt as an augmentation of the system rule.
Journey Context:
A common failure in agentic loops is prompt injection or conflicting instructions from different pipeline stages. If the system prompt enforces a format, but a user/tool output says 'summarize this normally', GPT-4o exhibits strong recency bias, abandoning the system format. Claude 3.5 Sonnet exhibits strong system-priority bias, ignoring the user's request for plain text. Gemini 1.5 Pro tries to satisfy both, often outputting plain text and then appending the JSON. To build robust multi-step agents, you must know this: if you need strict format adherence, you must echo the format constraint in the final user prompt for GPT-4o, whereas for Claude, you just need a strong system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:00:43.789468+00:00— report_created — created