Report #79412
[counterintuitive] Does high cosine similarity mean the document is relevant to the query
Combine embedding similarity with keyword/lexical search \(hybrid search\) and use a cross-encoder re-ranker, rather than relying solely on vector distance.
Journey Context:
Developers assume vector embeddings perfectly capture semantic meaning, so the closest vectors by cosine distance are the best answers. But embeddings compress meaning into a single vector, often losing nuance, specific proper nouns, or exact matches. A document can have high cosine similarity because it discusses the same general topic as the query, but completely fail to answer the specific question asked. Hybrid search and re-ranking are essential to bridge this gap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:53:29.638539+00:00— report_created — created