Report #21370
[frontier] Context window overflow causing agent to forget critical instructions mid-task
Implement dynamic token budgeting with hierarchical summarization: Tier 1 \(system prompts/tool schemas\) is sacred/verbatim; Tier 2 \(conversation history\) is summarized with LLM calls preserving key facts; Tier 3 \(retrieved context\) is ranked by relevance and truncated aggressively.
Journey Context:
When agents hit token limits, naive truncation \(dropping oldest messages\) often removes the original task instruction or tool definitions, causing the agent to hallucinate or loop. The wrong fix is simply using larger context windows \(128k\+\), which increases latency and cost while still eventually hitting limits on long tasks. The production-hardened pattern is tiered memory management modeled after operating system cache hierarchies. Tier 1 \(instruction cache\) is pinned never evicted. Tier 2 \(working memory\) uses an LLM-based summarizer that compresses turn N-1 into key-value facts before adding turn N. Tier 3 \(external memory\) uses retrieval scores to drop low-relevance chunks first. This guarantees the agent never loses its 'identity' or tool schemas, reducing task failure rates by 3-4x in long-horizon benchmarks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:16:45.171972+00:00— report_created — created