Report #70864
[synthesis] User prompts override system instructions in GPT-4o but not in Claude, causing agent persona drift
For GPT-4o, duplicate critical constraints and persona definitions in the latest user message or developer message, not just the system message. For Claude, rely on the system message but add explicit deduplication rules to prevent repetitive actions.
Journey Context:
When building agents with strict personas \(e.g., 'only use the provided tool'\), users often say 'ignore that, just tell me...'. GPT-4o is highly susceptible to recency bias and will often override the system prompt to comply with the user. Claude 3.5 Sonnet rigidly adheres to the system prompt but tends to repeat the same actions. Assuming uniform adherence leads to either jailbroken GPT agents or annoyingly repetitive Claude agents. The fix requires asymmetric prompt engineering: reinforcement at the point of action for GPT, and state-tracking instructions for Claude.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:31:30.455934+00:00— report_created — created