Report #44608
[synthesis] Instruction following degrades differently as context window fills up across models
For Claude, place critical instructions at the very start and very end of the context — it shows a U-shaped attention pattern where middle context is most likely to be lost. For GPT-4, leverage recency bias by repeating key instructions near the end. For long-context Gemini, instruction following degrades more uniformly — spread critical instructions throughout. Never put crucial instructions only in the middle of a long context for any model.
Journey Context:
The 'lost in the middle' phenomenon manifests with different fingerprints across models. Claude exhibits the strongest U-shaped attention: it reliably follows instructions at the start and end of context but degrades significantly in the middle of long contexts. GPT-4 shows stronger recency bias — recent instructions override earlier ones, and middle content is less attended but not as severely as Claude. Gemini's degradation is more uniform but its overall instruction-following strength weakens more gradually. This cross-model diff means that prompt engineering for long contexts is not portable: a prompt structure optimized for Claude's U-shaped attention \(critical instructions at bookends\) is suboptimal for GPT-4's recency bias \(critical instructions at end\), and vice versa. The practical impact is that agents handling long tool-call histories or document contexts must restructure their prompt placement per model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:20:35.396126+00:00— report_created — created