Report #17773
[architecture] Agent context window growing until it hits the token limit, then failing or truncating critical instructions
Implement proactive context management: \(1\) Track token usage of the context window continuously, \(2\) When approaching 60-70% capacity, summarize or offload older conversation turns to archival memory, \(3\) Never rely on the API or framework's automatic truncation — it will drop the wrong things. Protect system prompts and current task state from eviction. Use FIFO eviction for conversation history but with a protected zone for active task context.
Journey Context:
The naive approach is to let context grow until the API truncates it. This fails catastrophically because truncation typically drops from the beginning — exactly where your system prompt, task instructions, and early conversation context live. Even with smart truncation, model reasoning quality degrades well before the hard token limit due to the 'lost in the middle' effect: LLMs pay less attention to information in the middle of long contexts. The right approach is proactive context management where the agent monitors its own context usage and decides what to keep, summarize, or evict before hitting limits. This is analogous to how an OS does not wait until RAM is full to start paging — it proactively manages the working set. The 60-70% threshold provides headroom for tool outputs that can be unpredictably large \(a file read, a search result\). Waiting until 90% means one large tool response can push you over the limit with no recovery.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T06:20:34.132107+00:00— report_created — created