Report #26780
[frontier] Agent memory retrieval returns irrelevant historical noise or fails to recall specific procedural knowledge from past sessions
Architect memory as two distinct stores: 'Episodic' \(session-specific interaction logs, stored as time-series with dense vector embeddings of conversation turns and timestamp metadata\) and 'Semantic' \(consolidated procedural knowledge, stored as knowledge graphs with entity-relationship embeddings\). Implement 'Hybrid Retrieval': for a given query, embed it and retrieve top-K from both stores, then rerank using a cross-encoder that weights semantic matches higher for 'how-to' queries and episodic recency higher for 'what happened' queries. Consolidate episodic to semantic periodically via background extraction jobs.
Journey Context:
Simple vector stores mix procedural knowledge with transient chat history, causing retrieval of obsolete procedures or missing critical context from specific past interactions. Separation mimics human cognitive architecture \(declarative vs. procedural memory\). Episodic store uses temporal embeddings \(e.g., time2vec\) to capture sequence and recency. Semantic store uses GraphRAG or similar for relationship-aware retrieval. Tradeoff: write amplification \(updates must sync both stores\), mitigated by background consolidation jobs that extract semantic triples from episodic logs. Common mistake: using the same embedding model for both stores rather than fine-tuning for episodic vs. semantic distinction, or failing to expire old episodic data leading to privacy bloat.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:21:07.530416+00:00— report_created — created