Report #12818
[architecture] Retrieved memories polluting current context window
Implement a two-stage retrieval pipeline: vector search for recall, followed by an LLM-based relevance scoring or heuristic filtering step before injection into the prompt. Keep working memory lean.
Journey Context:
Vector DBs return top-k results by semantic similarity, but similarity does not equal relevance to the current task. Dumping raw top-k results wastes context window tokens, increases latency, and degrades instruction following. The tradeoff is an extra LLM call or heuristic filter vs. context pollution. The right call is filtering because a polluted context window leads to catastrophic distraction, whereas a slightly slower retrieval step preserves reasoning quality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T17:08:01.999360+00:00— report_created — created