Report #67961
[counterintuitive] High cosine similarity in embeddings means the text is semantically relevant to the query
Use hybrid search \(combining BM25 keyword search and vector search\) and reranking models, as embedding similarity often matches on topical overlap rather than answer relevance.
Journey Context:
Vector databases are marketed as semantic search. But embedding models compress meaning into a single vector, losing nuance. A document asking 'What is the capital of France?' and a document stating 'What is the capital of Germany?' will have high cosine similarity but are completely different in terms of answer relevance. Keyword search catches exact matches that embeddings miss.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:33:22.632606+00:00— report_created — created