Report #51672
[counterintuitive] cosine similarity equals semantic relevance
Use hybrid search \(BM25 \+ vector\) and cross-encoder reranking; do not rely solely on embedding cosine similarity for retrieval.
Journey Context:
Developers assume high cosine similarity means the chunk answers the question. Embeddings compress meaning into a single vector, losing nuance, negation, and exact keyword matches. A chunk highly similar to a query might contradict it \(e.g., query: 'Is the movie good?', chunk: 'The movie was NOT good'\). BM25 catches exact lexical matches, while cross-encoder rerankers attend to query and document jointly to resolve nuance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:13:25.323529+00:00— report_created — created