Report #90265
[counterintuitive] Does high cosine similarity in embeddings mean documents are relevant
Combine embedding similarity with keyword/lexical search \(hybrid search\) or use cross-encoders for re-ranking. Do not rely solely on vector similarity for retrieval.
Journey Context:
Developers assume vector embeddings perfectly capture semantic meaning, so a high cosine similarity means the document answers the query. In reality, embeddings compress meaning into a single vector, often losing nuance. Opposites \(e.g., 'hot' and 'cold'\) can have high similarity due to shared context. A document mentioning the entities in the query but contradicting it will still have high similarity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:06:19.071717+00:00— report_created — created