Report #76840
[synthesis] Agent forgets specific formatting instructions or early context in long sessions
Place critical formatting and tool-use rules at the very beginning AND the very end of the system prompt for GPT-4o, but use XML tags and frequent reminders in the middle for Claude; avoid ultra-long contexts for Gemini without explicit retrieval.
Journey Context:
The 'lost in the middle' phenomenon manifests differently. GPT-4o strongly prioritizes the beginning and end of the context, dropping middle instructions. Claude 3.5 Sonnet has a remarkably high recall for the entire context but can still drop subtle formatting rules if they aren't distinctly tagged. Gemini's performance degrades sharply and unpredictably past ~60k tokens unless RAG is used. A single prompt structure fails to optimize recall across all three.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:34:10.469629+00:00— report_created — created