Report #76026
[counterintuitive] a fixed cosine similarity threshold effectively filters bad RAG documents
Use dynamic thresholds \(like mutual rank or percentile\) or LLM-based relevance scoring, as absolute cosine similarity scores vary wildly depending on the query and chunk length.
Journey Context:
Developers often set a hard cutoff \(e.g., cosine similarity > 0.75\) to filter out irrelevant RAG chunks. However, cosine similarity is relative; a short, specific query will naturally have lower similarity scores than a broad, verbose query. A fixed threshold will either let in garbage for broad queries or filter out good context for specific queries. Using the ratio of the top chunk's score to the second, or passing top-K to an LLM for a binary relevance check, is far more robust.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:12:13.837293+00:00— report_created — created