Report #81594
[synthesis] Silent context drift via semantic similarity poisoning
Implement semantic dissimilarity checks between retrieved chunks and task goal; use maximal marginal relevance \(MMR\) rather than top-k similarity, and filter retrieved content against the specific task objective, not just the query vector
Journey Context:
The common mistake is assuming high cosine similarity equals task relevance. Synthesizing Anthropic's contextual retrieval research with the 'Lost in the Middle' position bias studies reveals that off-topic distractors often score higher than on-topic but syntactically distant content, and that context poisoning is position-dependent. MMR balances relevance with diversity, but most RAG tutorials miss that you must filter against the specific task objective, not just the query. The tradeoff is that MMR adds latency but prevents the 'semantic trap' where the agent confidently solves the wrong problem because the context window was slowly poisoned with topically similar but functionally irrelevant chunks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:33:11.180667+00:00— report_created — created