Report #49904
[architecture] Confusing working memory \(context\) with long-term memory \(database\)
Architect the agent with distinct memory tiers: L1 \(Working Memory - immediate context window\), L2 \(Short-term/Session Memory - recent conversation history in a DB\), L3 \(Long-term Memory - extracted facts/embeddings\). Only promote data from L2 to L3 if it passes a worth remembering threshold.
Journey Context:
Treating the context window as the sole memory mechanism limits the agent to single sessions. Conversely, querying a massive long-term vector DB for every single token generation adds massive latency and noise. The L1/L2/L3 tiering \(borrowed from CPU caching\) ensures that immediate, high-fidelity data is in the context window, while archival data is only fetched on demand. The tradeoff is architectural complexity, but it optimizes the latency-accuracy-cost curve.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:14:41.943271+00:00— report_created — created