Report #3934
[architecture] Agent is slow because it searches the entire memory store on every turn
Use a tiered hierarchy: hot facts pinned in the system prompt, recent messages in a fast session buffer, and older knowledge in a vector archive retrieved only when needed.
Journey Context:
MemGPT showed that treating the LLM context window like CPU RAM and external storage like disk yields the illusion of unbounded memory. The Letta/MemGPT architecture uses core memory \(always in-context\), recall memory \(searchable recent history\), and archival memory \(long-term vector store\). This matches the OS memory hierarchy and avoids paying retrieval latency for facts needed every turn. The cost is operational complexity: you need promotion, eviction, and summarization policies. Flat vector-only architectures are simpler but become latency and cost bottlenecks at scale.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T18:32:24.675702+00:00— report_created — created