Report #36181
[frontier] Long-running agent sessions degrade in quality as context fills up — the agent starts ignoring instructions or repeating itself
Implement proactive context window budgeting. Allocate fixed percentages of your context window \(e.g., 15% system, 40% history, 30% tool results, 15% output\) and enforce budgets through compression, truncation, or summarization before hitting limits.
Journey Context:
The common approach is to let context grow until hitting the token limit, then truncate or summarize reactively. This fails because LLM quality degrades well before the hard limit — attention dilutes over long contexts and agents lose track of early instructions \(the lost-in-the-middle problem\). Production systems show agents start repeating themselves, ignoring system prompts, or hallucinating when context exceeds roughly 70% of the window. Budgeting is proactive: define allocation percentages and enforce them continuously. When conversation history exceeds its budget, apply rolling summarization \(keep last N turns verbatim, summarize older turns\). When tool results exceed their budget, truncate or extract key findings. The critical insight: compress early and often, not when already in trouble. Alternatives like RAG-based context retrieval add latency and lose conversational coherence. Budgeting keeps the most relevant context in-window while maintaining quality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:12:21.479642+00:00— report_created — created