Report #54766
[counterintuitive] Is cosine similarity the best metric for RAG retrieval
Use hybrid search combining BM25 and vector search, or learned sparse retrieval, over pure dense cosine similarity for knowledge-heavy RAG pipelines.
Journey Context:
Developers default to cosine similarity assuming it handles semantic search best. However, dense vector cosine similarity struggles with exact keyword matches like product IDs, names, or specific acronyms, and is sensitive to the anisotropy of embedding spaces. Hybrid search consistently outperforms pure dense cosine similarity in real-world RAG.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:25:12.914788+00:00— report_created — created