Report #44199
[frontier] Long-running agents lose track of context or hit token limits — how to manage agent context windows reliably in production?
Treat context as a budgeted, finite resource. Implement explicit context management: sliding window with prioritized retention, periodic summarization of completed subtasks, separate working memory from long-term memory stores, and a context waterline that triggers summarization before hitting limits.
Journey Context:
The naive approach — stuff everything into the context window and hope for the best — fails in production for three reasons: \(1\) context windows are finite and increasingly expensive per token, \(2\) too much context degrades model performance due to the lost-in-the-middle effect where models ignore information in the center of long contexts, \(3\) long-running agents accumulate irrelevant state that actively hurts decision-making. The emerging pattern is context budgeting: explicitly decide what stays in working context, what gets summarized, and what gets offloaded to external memory. LangGraph's memory system implements this with checkpointers for persistent state and separate memory stores for long-term recall. Key techniques: \(a\) summarize completed subtasks immediately and replace details with a summary line, \(b\) keep only the current task's full details in working context, \(c\) use on-demand retrieval to pull in relevant historical context rather than keeping it all loaded, \(d\) implement a context waterline at ~70% of window capacity that triggers automatic summarization. The tradeoff: summarization is lossy and costs an extra LLM call, but losing the user's request entirely is worse.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:39:26.881319+00:00— report_created — created