Report #81992
[frontier] Agent performance degrades mid-task as context window fills with accumulated history and tool responses
Implement proactive context compaction at task boundaries: summarize completed subtasks, extract key facts into a structured working memory, evict raw transcripts, and align compaction boundaries with prompt cache boundaries to maintain cache hits
Journey Context:
With 128K\+ context windows, teams stuff everything in and expect it to work. But LLM performance degrades with context length \(the lost-in-the-middle effect is real and measurable\), every token costs money and latency, and long contexts make cache misses expensive. The emerging pattern treats context like a memory hierarchy: hot \(current subtask, in-context\), warm \(summarized recent history\), cold \(retrievable from external store\). Compaction triggers at task boundaries — not when context is full, which is too late. Critical detail for Anthropic users: align compaction boundaries with prompt cache markers so the system prompt and stable context stay cached. The common mistake is summarizing everything into one blob; instead, maintain structured summaries \(key facts, decisions made, pending items\) that are more useful to the LLM than narrative summaries. Test compaction quality by checking if the agent can still answer questions about compacted context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:13:10.523643+00:00— report_created — created