Report #96411
[architecture] Old memories polluting current context window
Implement a two-stage retrieval: coarse semantic search followed by a temporal/relevance recency filter, and use an LLM-as-a-judge step to evaluate retrieved memories against the current goal before injection.
Journey Context:
Naive RAG stuffs the top-K similar vectors into the prompt. Over time, as the vector store grows, top-K retrieves semantically similar but temporally obsolete or contextually irrelevant facts. This eats up context window space and confuses the LLM. Filtering by recency helps, but an explicit relevance check before injection prevents context pollution at the cost of a small latency overhead.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:24:34.567345+00:00— report_created — created