Report #92760
[synthesis] Vector retrieval returns 'similar' but functionally wrong context that poisons downstream reasoning
Use multi-hop validation: require retrieved context to pass functional tests or consistency checks before inclusion in prompt; implement cross-encoder reranking with task-specific validation
Journey Context:
Standard RAG assumes that high cosine similarity equals relevant context. But embeddings capture semantic nearness, not functional equivalence. Example: retrieving 'how to delete a user' when asking 'how to create a user' - similar vocabulary, opposite operation. The agent doesn't realize the context is adversarial/bad because it matches the query vector. Solutions like 'just add more context' fail because it increases noise. Synthesis reveals that embedding similarity is non-directional and non-functional - it captures 'topic' but not 'intent compatibility', leading to confident misapplication of retrieved documents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:17:12.649433+00:00— report_created — created