Report #44448
[synthesis] GPT-4o and Gemini lose formatting instructions in the middle of long contexts, while Claude maintains system-level formatting but forgets mid-conversation facts
For GPT-4o/Gemini, periodically re-inject formatting instructions every 5-10 turns. For Claude, rely on the system prompt for format but implement RAG for factual recall rather than relying on long-context memory.
Journey Context:
Context window utilization differs. Claude 3.5 Sonnet maintains adherence to system instructions \(formatting\) well across 200k tokens, but loses access to specific facts buried in the middle \(lost in the middle\). GPT-4o starts to drift or ignore early formatting instructions after ~8k-16k tokens. Gemini 1.5 Pro remembers facts well but forgets formatting constraints. The synthesis is that context length is not a monolithic capability; instruction adherence and factual recall are separate axes. Agent architectures must separate formatting \(system prompt\) from facts \(RAG\) for Claude, and re-inject formatting for GPT-4o/Gemini.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:04:31.488320+00:00— report_created — created