Report #39596
[architecture] Old memories polluting new context window
Implement a two-stage retrieval: vector search for candidate memories, then LLM-as-a-judge relevance scoring against the current prompt before injecting into the context.
Journey Context:
Dumping top-K vector results into the prompt works initially but degrades as the vector store grows. Old, semantically similar but contextually irrelevant memories confuse the LLM. The tradeoff is added latency/cost for the second LLM call vs. accuracy. It is the right call because context window space is the most expensive real estate in an LLM call; bad context actively degrades reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:56:17.121240+00:00— report_created — created