Report #48884
[counterintuitive] Does high cosine similarity mean the document answers the question
Use a cross-encoder reranker after initial dense retrieval; do not rely solely on embedding cosine similarity for final context selection.
Journey Context:
Developers assume vector search \(bi-encoder embeddings\) perfectly captures semantic relevance. However, embeddings compress meaning into a single vector, losing nuance. They often retrieve documents that mention the entities in the query but contradict the premise or are topically similar but factually irrelevant. Cross-encoders jointly process the query and document, capturing the interaction between them and drastically improving precision.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:32:10.912032+00:00— report_created — created