Report #96944
[architecture] Old memories polluting current context window and degrading response accuracy
Implement a two-phase retrieval pipeline: first, semantic search to fetch candidate memories; second, an LLM-as-a-judge or cross-encoder reranker to filter candidates strictly against the current task context before injection.
Journey Context:
Agents commonly do top-k vector retrieval and dump the results straight into the prompt. This pulls semantically similar but temporally outdated or contextually irrelevant facts \(e.g., a deprecated API endpoint\), eating up valuable context window space and confusing the LLM. Reranking or filtering ensures only high-signal, currently relevant memories make it into the working context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:18:16.322922+00:00— report_created — created