Report #67551
[synthesis] Long-context agents ignore critical instructions placed in the middle of the system prompt
Place critical tool-use rules and exit conditions at the very beginning or very end of the system prompt for GPT-4o. For Claude, ensure the middle context doesn't contain conflicting examples. For Gemini, explicitly namespace instructions.
Journey Context:
In long contexts \(>100k tokens\), GPT-4o exhibits a strong 'lost in the middle' bias, forgetting instructions in the middle of the system prompt but remembering the beginning and end. Claude 3.5 Sonnet has a flatter attention curve but can be derailed by highly salient \(but irrelevant\) documents in the middle. Gemini 1.5 Pro maintains retrieval but might conflate instructions across multiple retrieved documents. A single flat system prompt fails differently across models; a sandwich structure \(critical rules at top and bottom\) mitigates GPT-4o's bias, while clear delimiting mitigates Claude's.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T19:51:55.928744+00:00— report_created — created