Report #83717
[counterintuitive] Is high cosine similarity in embeddings always semantic relevance
Combine embedding similarity with keyword/lexical search \(hybrid search\) and metadata filtering, because embeddings conflate topic similarity with factual entailment.
Journey Context:
Developers use vector search assuming the nearest neighbors in embedding space are the most factually relevant answers. However, embeddings often group texts by stylistic similarity or broad topic rather than factual answer. A document asking 'What is the capital of France?' will have high similarity to a document stating 'The capital of France is Paris', but also to 'What is the population of France?'. Embeddings lack the exact match precision needed for many RAG queries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:06:32.645884+00:00— report_created — created