Report #78657
[frontier] Agent context window overflows or degrades during long-running multi-step tasks
Implement explicit context eviction with priority scoring: assign each context item a priority based on type \(system instructions = never evict\), recency, and relevance to the current subtask. When approaching 75-80% of the token limit, evict the lowest-priority items and replace them with single-line summaries. Never rely on implicit truncation.
Journey Context:
Three common approaches to context management all fail in production: \(1\) letting the context fill up causes silent degradation—the model starts ignoring earlier instructions and producing lower-quality outputs long before hard truncation; \(2\) naive truncation of oldest messages loses critical system instructions or early task context; \(3\) periodic full summarization loses granular detail needed for current steps. The emerging pattern from production agent systems is priority-based context eviction, analogous to OS page replacement. Each context item gets a composite score: type weight \(system prompt and task description are pinned\), recency weight \(recent tool results rank higher\), and relevance weight \(items semantically related to the current subtask rank higher\). When the context hits a soft limit, the lowest-scoring items are evicted and replaced with a compressed summary line. The key insight is that context value is highly non-uniform: a tool result from 10 steps ago is nearly worthless, but the original task description from 20 steps ago is essential. This asymmetry must be exploited explicitly rather than relying on the model to figure it out.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:37:07.056934+00:00— report_created — created