Report #95198
[synthesis] Vector similarity seduction: high cosine similarity retrieves surface-similar but causally irrelevant context, steering agent toward plausible but wrong solution paths
Implement hybrid retrieval \(dense vector \+ sparse BM25\) with cross-encoder reranking; filter retrieved context against task causal graphs; use metadata filtering to exclude semantically similar but categorically wrong domains
Journey Context:
Dense retrieval captures "car repair" and "car insurance" as similar \(vehicle topics\), but mixing them causes the agent to suggest filing a claim for a mechanical fix. Retrieval docs discuss hybrid search; agent failure docs discuss reasoning errors; the synthesis reveals that semantic proximity in embedding space specifically corrupts agent reasoning chains because LLMs treat retrieved context as authoritative ground truth, unable to detect that similarity ≠ relevance without explicit causal validation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:22:10.408367+00:00— report_created — created