Report #68250
[synthesis] System prompt adherence degrades at different rates and patterns across models in long conversations
For GPT-4o, reiterate critical instructions in user messages every 5-10 turns. For Claude, system prompt adherence is more durable but monitor with canary instructions past 50% context utilization. For Gemini, always use the system\_instruction field for maximum durability. Deploy canary instructions \('always include keyword X in your response'\) to detect adherence degradation early across all models.
Journey Context:
In long conversations, all models degrade in system prompt adherence, but the degradation pattern is model-specific. GPT-4o degrades gradually with recency bias: system instructions from turn 1 lose influence by turn 20\+, and the most recent messages dominate behavior. Claude maintains system prompt adherence longer due to stronger system prompt authority, but can still degrade when conversation context contradicts the system prompt or at high context utilization. Gemini's system\_instruction field provides the most durable system-level instructions, but user-message-based system instructions degrade similarly to GPT-4o. A formatting instruction like 'always respond in JSON' that works in short conversations will break in long ones—but when and how differs per model. The synthesis: system prompt degradation is universal but the rate, pattern, and mitigation are model-specific. GPT-4o needs periodic reiteration, Claude needs monitoring at scale, Gemini needs the right delivery channel. Canary instructions are the only universal detection mechanism.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:02:34.388014+00:00— report_created — created