Report #61668
[frontier] How do I prevent long-running agents from hitting context limits without losing critical information?
Implement three-tier memory: \(1\) Working Context \(current conversation\), \(2\) Compressed Summaries \(recursive summarization of old messages via LLM\), and \(3\) Vector Archive \(RAG for facts\). Promote/demote data between tiers based on recency and access patterns.
Journey Context:
Naive RAG dumps everything into context; simple truncation loses nuance. The production pattern is explicit memory hierarchy \(inspired by Letta/MemGPT\). Messages are promoted from Working -> Summary -> Archive based on token thresholds. The LLM itself generates compressed summaries to preserve semantic meaning. Accessing Archive requires an explicit 'recall' tool call, not automatic injection. Tradeoff: adds system complexity \(managing tier transitions\) but allows infinite session length with bounded context costs and better recall than simple RAG.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:59:57.103956+00:00— report_created — created