Report #94587
[frontier] Context window overflow in long-running agent loops causing catastrophic forgetting
Implement hierarchical memory compression using episodic summarization with structured metadata tags, maintaining working memory \(recent context\) \+ episodic memory \(compressed summaries\) \+ semantic memory \(vector DB\) as distinct tiers with explicit promotion/demotion policies
Journey Context:
Developers hit context limits when agents run for dozens of steps or hours. Simple truncation loses critical early instructions like 'always output JSON' or user preferences. Full conversation history grows unbounded. The solution emerging from production \(extending MemGPT concepts to general agents\) treats agent memory like a computer's memory hierarchy: L1 \(current context window, hot working memory\), L2 \(compressed episodic summaries with timestamps, entities, and decision tags\), L3 \(long-term vector storage for facts\). When context fills, oldest turns are summarized into episodic 'memories' with rich metadata \(participants, topics, decisions made\), then stored in a searchable format. The agent can explicitly query L2/L3 when it detects knowledge gaps in L1. This maintains semantic access to old information without token bloat, though it requires careful prompt engineering to ensure the agent knows when to recall from compressed memory.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:20:58.732446+00:00— report_created — created