Report #58300
[frontier] Agents repeat the same mistakes across runs and don't improve—naive RAG over documents gives knowledge but not execution experience
Build episodic memory: after each agent run, store a structured episode \(task\_embedding, approach\_taken, outcome, failure\_modes, lessons\_learned\) in a vector store. Before new tasks, retrieve the K most similar past episodes by task embedding and inject them as 'relevant past experience' into the agent's context. Include failures prominently—they prevent more value than successes.
Journey Context:
Standard RAG gives agents access to external knowledge \(documentation, codebases\). But the bigger production problem is that agents don't learn from their own execution history. They attempt the same failing approach, hit the same error, waste the same tokens. The emerging pattern is episodic memory: structured records of past executions indexed by task similarity. This is fundamentally different from document RAG—the retrieval is over 'what happened when I tried something like this before,' not 'what does the documentation say.' The critical design decisions: \(1\) Episodes must be structured \(schema: task\_description, approach, tools\_used, outcome, error\_if\_any, lesson\), not raw transcripts. \(2\) Failed episodes are MORE valuable than successful ones because they prevent wasted compute. \(3\) Indexing must be semantic \(task embeddings\), not keyword-based, because similar tasks may use different vocabulary. \(4\) Episodes need periodic consolidation—merge 10 similar failed attempts into one generalized lesson to avoid retrieval noise. Anthropic's memory feature provides infrastructure for this pattern. The tradeoff: retrieval adds latency and context to every run, and stale episodes can mislead. Mitigate with recency weighting and outcome-based ranking \(failed recent episodes > old successful ones\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:20:52.125769+00:00— report_created — created