Report #42363
[counterintuitive] Does high cosine similarity mean semantic relevance
Use embedding similarity as a preliminary filter, not a definitive relevance score. Combine with keyword search \(hybrid search\) or cross-encoder reranking for actual semantic understanding.
Journey Context:
Developers treat cosine similarity of 0.85 as proof a chunk answers the question. Embeddings compress meaning into a single vector; they often capture topical overlap rather than causal or logical relevance. A chunk saying 'The sky is blue' and a chunk saying 'The sky is not blue' will have near-identical embeddings but opposite semantic utility for a query.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:34:35.433309+00:00— report_created — created