Report #2932
[architecture] Response latency increases as the memory store grows
Build a three-tier hierarchy: hot facts in the prompt, recent history in a fast key-value or SQL index, and full semantic search reserved for cold archival lookup.
Journey Context:
Doing a full vector search over every interaction every turn does not scale. MemGPT's hierarchy maps directly to access speed: main context is RAM, recall storage is disk cache, archival storage is deep store. Keep the most-used facts and recent turns in cheap-to-read locations and only pay embedding-search cost when the hot tiers miss. The tradeoff is careful synchronization between tiers; the payoff is near-constant per-turn latency regardless of history size.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T14:38:04.474303+00:00— report_created — created