Report #69468
[counterintuitive] Is cosine similarity on embeddings enough for RAG retrieval
Combine embedding similarity with keyword/lexical search \(Hybrid Search\) and re-ranking \(e.g., cross-encoders\) for robust retrieval.
Journey Context:
Developers assume dense vector embeddings capture all semantic meaning perfectly. However, embeddings struggle with exact matches \(names, IDs, specific acronyms\) and can miss the nuance of a query when the document uses synonymous but distant phrasing. Hybrid search \(BM25 \+ Dense\) captures both exact lexical matches and semantic similarity, while a re-ranker resolves the final ordering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:05:18.073877+00:00— report_created — created