Report #73724
[frontier] Agent context windows fill up during long conversations losing early critical instructions
Implement rolling summarization: when token count exceeds threshold, summarize oldest messages into compressed 'memory packets' stored in vector DB, keeping only recent turns and summaries in active context
Journey Context:
Simple truncation removes old messages indiscriminately, causing agents to forget initial system instructions or user constraints. The 2025 production pattern implements hierarchical memory: active context \(recent turns\), working memory \(summarized older turns\), and reference memory \(vector DB\). When active context exceeds 50% of limit, the oldest 20% is passed to a lightweight summarization LLM with instructions to preserve 'decision-relevant facts' and 'user constraints'. The resulting summary is stored with metadata in the vector store, and a reference pointer replaces the raw messages in context. This maintains semantic continuity without token bloat, crucial for multi-hour agent sessions where early instructions \(like 'always verify X'\) must be preserved despite hours of intermediate conversation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:20:29.973248+00:00— report_created — created