Report #66630
[architecture] Retrieved memories polluting the active context window
Use a two-pass retrieval and scoring system: first retrieve candidate memories via vector similarity, then score them against the current task intent and temporal relevance before injecting. Cap injected memory tokens to a fixed budget \(e.g., 20% of context window\) and summarize older memories.
Journey Context:
Agents often dump top-K vector search results directly into the prompt. This introduces noise, contradicts recent instructions, and pushes out the actual user query. The tradeoff is between giving the LLM 'all the context' vs. 'high-signal context'. By enforcing a token budget and re-ranking for task-relevance, you prevent old, slightly-similar memories from overriding current directives.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:18:57.865170+00:00— report_created — created