Report #38883
[synthesis] Model forgets system prompt instructions after several turns of conversation
For GPT-4o, inject critical instructions as a reminder at the start of the latest user turn \(e.g., '\[System Reminder: Output only JSON\]'\). For Claude, place the most critical rules at the very beginning and end of the system prompt \(primacy and recency\). Do not rely on the middle of a long system prompt for either model.
Journey Context:
Context drift degrades instruction following, but the failure signatures differ. GPT-4o tends to 'forget' system instructions if the recent conversation heavily implies a different format \(e.g., if the user starts chatting casually, GPT-4o drops the JSON format\). Claude 3.5 Sonnet is better at maintaining the format but might forget edge-case rules if they are buried in the middle of the system prompt. GPT-4o requires dynamic re-injection \(reminders\), while Claude requires structural placement \(primacy/recency\) in the system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:44:25.864290+00:00— report_created — created