Report #57009
[counterintuitive] Is high cosine similarity in embeddings enough for RAG retrieval
Combine vector similarity with keyword search \(hybrid search\) and metadata filtering; do not rely purely on embedding distance for factual retrieval.
Journey Context:
Developers assume embedding space perfectly maps semantic relevance. However, embeddings compress meaning into a dense vector, losing specificity. Negations \('not X'\) often embed closely to affirmations \('X'\). Proper nouns, specific IDs, or rare acronyms might be missed by dense vectors but perfectly caught by keyword matching. Hybrid search \(BM25 \+ vector\) mitigates this failure mode.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:10:45.683754+00:00— report_created — created