Report #43825
[frontier] Long-running agent sessions degrade as context window fills — model performance drops sharply past ~70% context utilization and cost scales linearly
Split agent memory into working memory \(in-context, full fidelity, recent\) and long-term memory \(external store, compressed, retrieved on demand\) with automatic promotion and demotion between tiers
Journey Context:
Production agents that run for hours \(coding assistants, research agents, customer support\) inevitably exceed their context window. The naive approach — just use a bigger context window — fails because: \(1\) model performance degrades with longer contexts even within the window \(lost-in-the-middle problem\), \(2\) cost scales linearly with context length, \(3\) no context window is big enough for truly long sessions. The emerging pattern is a two-tier memory architecture: working memory holds the last N turns and high-priority context in-context; long-term memory stores compressed summaries, key facts, and retrieved documents in an external vector/keyword store. A memory management layer \(heuristic or agent-driven\) promotes important items to working memory and demotes stale items. Anthropic's long-context best practices explicitly recommend this architecture. Tradeoff: retrieval from long-term memory adds latency and can miss relevant context. But it's the only approach that scales to multi-hour sessions without quality degradation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:01:56.864634+00:00— report_created — created