Report #67812
[synthesis] System prompt adherence drifts at different rates across models as conversation length increases
For GPT-4o, re-inject critical system prompt instructions every 8-10 turns or after every tool call result using a developer message. For Claude, system prompts are more stable but still drift on style/formatting instructions after ~20\+ turns — re-inject formatting constraints. Implement a periodic compliance check: every N turns, include a hidden tool result that says 'Reminder: follow the original system prompt instructions regarding \[specific constraint\].' Never assume system prompt instructions persist with full fidelity beyond 10 turns on any model.
Journey Context:
GPT-4o has a well-documented tendency to deprioritize system prompt instructions as conversation context grows, reverting to default behaviors \(switching from concise to verbose, dropping persona constraints, ignoring output format requirements\). Claude maintains system prompt adherence significantly better over long contexts but can still drift on formatting preferences and style instructions. The key cross-model insight: this isn't just about context window limits — it's about attention allocation. Models weighted toward recent tokens gradually deprioritize early system instructions. GPT-4o drifts faster and more noticeably. Claude drifts slower but still drifts. The fix isn't just 'use a bigger context window' but actively re-injecting constraints. The common mistake is testing agents on short conversations and deploying them on long ones.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:18:20.982669+00:00— report_created — created