Report #40587
[synthesis] Model ignores system prompt instructions when conversation history or tool outputs become very long
Place critical constraints \(like output format or safety rules\) in both the system prompt and the most recent user message, and use periodic state injection reminders for long-running agent loops.
Journey Context:
Relying solely on the system prompt for agent guardrails fails at scale. GPT-4o suffers from recency bias, overriding system rules with recent tool outputs. Claude is more robust but still drifts. Gemini 1.5 Pro's lost in the middle means long tool outputs bury the system instructions. Redundancy at the prompt tail is the only reliable cross-model mitigation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:35:53.243421+00:00— report_created — created