Report #39526
[synthesis] Models overriding system instructions when users issue conflicting commands in the prompt
For GPT-4o, reinforce critical system instructions by repeating them at the very end of the prompt \(epilogue\). For Claude, ensure the system prompt uses absolute language \('You MUST NOT...'\). For cross-model compatibility, do both: put the core rules in the system prompt and repeat a condensed version at the end of the user message.
Journey Context:
Prompt injection and jailbreaking exploit this difference. GPT-4o's recency bias makes it susceptible to 'ignore previous instructions.' Claude's primacy bias makes it more robust against injection but less flexible if the user legitimately needs to override a default. The sandwich method \(system prompt \+ user prompt epilogue\) is the only cross-model defense.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:49:15.689220+00:00— report_created — created