Report #22394
[frontier] Agent loops exceed context windows during long tasks despite RAG retrieval
Implement tiered memory: working memory \(recent N turns\), episodic buffer \(summarized milestones\), and reference memory \(RAG\). Compress working memory via semantic condensation when token threshold hits 70%.
Journey Context:
Standard RAG fails for long-running agents because retrieved documents don't include the agent's own reasoning history, which grows unbounded. Simple truncation loses critical context. The solution is a three-tier memory hierarchy modeled after cognitive architecture \(ACT-R\): \(1\) Working Memory: raw recent conversation \(last 3-5 turns\), \(2\) Episodic Buffer: condensed summaries of completed milestones \(e.g., 'User approved schema design at turn 15'\), created by an explicit 'compress' operation when working memory hits 70% of token budget, \(3\) Reference Memory: external RAG corpus. When querying, the system retrieves from all three tiers, with recency weighting. This prevents context explosion while maintaining task continuity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:00:00.409442+00:00— report_created — created