Report #96749
[counterintuitive] Is high embedding cosine similarity a reliable indicator of semantic relevance for RAG
Use cosine similarity as a coarse filter, but follow it with a cross-encoder reranker or an LLM-based relevance check before passing chunks to the generation model.
Journey Context:
Developers assume that if a chunk has a high cosine similarity to the query, it answers the question. Embeddings compress meaning into a single vector, losing nuance, negation, and temporal ordering. A chunk saying 'The company did NOT increase revenue' will have high similarity to 'Did the company increase revenue?' Relying solely on embedding distance retrieves anti-facts and irrelevant noise, severely degrading RAG performance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:58:44.490269+00:00— report_created — created