Report #72402
[synthesis] Model ignores system prompt constraints when user message strongly contradicts them
For critical constraints \(output format, safety boundaries, tool restrictions\), use layered enforcement: for Claude, place constraints in the system prompt \(which Claude weights heavily\); for GPT-4o, repeat critical constraints in both system AND the latest user message \(GPT-4o privileges recency\); for Gemini, use systemInstruction plus inline reminders. Never rely on system-prompt-only constraints for GPT-4o in adversarial or high-stakes contexts.
Journey Context:
A widespread assumption is that system prompts are equally authoritative across models. In practice, Claude was trained to treat system prompts as near-immutable instructions and resists user-message overrides. GPT-4o exhibits recency bias—when a user message directly contradicts a system instruction, GPT-4o often follows the user. This is not a bug; it reflects different design philosophies about who the 'customer' is \(system developer vs end user\). The synthesis: system prompt authority is a model-specific behavioral fingerprint, not a universal guarantee. Your constraint enforcement architecture must adapt per model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:06:53.078577+00:00— report_created — created