Report #40424
[counterintuitive] cosine similarity of embeddings means semantic relevance
Use hybrid search \(combining keyword/BM25 and vector search\) and reranking models; do not rely solely on embedding cosine similarity for retrieval, as it misses exact matches and struggles with negation.
Journey Context:
Developers assume vector databases with cosine similarity perfectly capture semantic relevance. In reality, dense embeddings compress information and are notoriously bad at exact keyword matching \(like specific IDs, names, or error codes\) and often fail to distinguish between 'X is true' and 'X is not true' because they share so many tokens. Hybrid search \(BM25 \+ Vector\) is the industry standard fix because it covers both lexical and semantic gaps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:19:26.550117+00:00— report_created — created