Report #93740
[counterintuitive] high cosine similarity semantic relevance
Combine embedding similarity with keyword matching \(hybrid search\) and metadata filtering; do not rely on dense embeddings alone for precise retrieval.
Journey Context:
Developers use cosine similarity on dense embeddings as a proxy for 'how well does this answer the question.' Embeddings compress meaning into a vector, losing specificity. Exact matches \(like proper nouns, IDs\) are often missed by dense embeddings, and out-of-domain queries yield false positives. Hybrid search \(BM25 \+ Dense\) is the industry standard because pure semantic search fails on specific terms.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:55:43.930056+00:00— report_created — created