Report #44684
[counterintuitive] embedding similarity semantic relevance
Use hybrid search \(combining embedding similarity with keyword/BM25 search\) and apply metadata filters before relying on cosine similarity alone.
Journey Context:
Developers assume high cosine similarity means the chunk answers the question. Embeddings compress meaning into a dense vector, losing specific lexical matches \(e.g., exact names, IDs\) and often surfacing topically related but non-answer-bearing chunks. A chunk saying 'The warranty does NOT cover water damage' might have high similarity to 'Does the warranty cover water damage?'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:28:14.465855+00:00— report_created — created