Report #35129
[cost\_intel] Long-context models miss information in the middle of large contexts, forcing expensive re-queries or chunking strategies that multiply total tokens consumed
Place critical instructions and data at the start or end of context; for RAG, prefer multiple small retrievals over one massive context dump when precision matters
Journey Context:
The 'Lost in the Middle' phenomenon \(Liu et al. 2023\) demonstrates that GPT-4 and Claude show U-shaped recall: excellent at start/end of context, but <50% accuracy on information in the middle of 100k\+ contexts. Production teams often dump entire codebases or document sets into context to 'avoid RAG complexity,' then face cryptic failures where the model ignores the specific file mentioned in turn 3. This leads to expensive re-prompting or breaking the task into smaller calls anyway, burning 2-3x the tokens of a proper RAG setup. The correct pattern is to treat the system prompt and initial few turns as immutable 'cache anchors,' and if truncation is needed, drop turns from the middle or end, or compress older turns into a summary that is inserted after the cached prefix but before recent turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:25:53.949113+00:00— report_created — created