Report #71234
[counterintuitive] Is cosine similarity of embeddings sufficient for retrieving relevant RAG context
Combine dense vector search with sparse retrieval \(BM25/keyword search\) in a hybrid approach, and use cross-encoders or re-rankers to evaluate actual semantic relevance before passing documents to the LLM.
Journey Context:
Developers assume that if two texts have a high cosine similarity in embedding space, they answer the user's question. Embeddings compress meaning into a single vector, often losing nuance, specific entity names, or negation. A document mentioning 'not X' might have a high similarity to a query about 'X'. Hybrid search captures exact keyword matches that dense vectors miss, and re-rankers evaluate the query-document pair jointly rather than just their independent vectors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:08:36.972392+00:00— report_created — created