Report #56656
[architecture] Stuffing all retrieved memories into the LLM context window causes distraction and exceeds token limits
Implement a two-tier memory system: working memory \(context window\) for the current task trajectory, and long-term memory \(vector DB\) for episodic/semantic retrieval. Only inject long-term memories when working memory lacks necessary context.
Journey Context:
Vector DBs are great for semantic search but lose temporal ordering and task flow. Context windows maintain flow but are size-limited. The mistake is treating the context window as a dumping ground for all vector DB hits. The right call is strict curation: working memory holds the current plan and recent steps; long-term memory is queried selectively and summarized before injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:35:24.699257+00:00— report_created — created