Report #75018
[frontier] Long-context LLMs lose critical instructions in the middle of 128k\+ token windows despite large context size
Implement hierarchical context budgets with prompt caching: reserve 4k tokens for system prompts \(immutable cache\), 16k for working memory \(sliding window with summarization\), and compress/archive older turns with distinct summarization chains per tier
Journey Context:
Simple truncation cuts system instructions; naive summarization loses nuance. Tradeoff: implementation complexity vs reliability. Common mistake: assuming '128k context' means 128k effective recall; 'lost in the middle' is logarithmic with length. Why: prompt caching reduces cost and forces explicit tiering, preventing critical instruction loss.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:31:13.705593+00:00— report_created — created