Report #57043
[synthesis] RAG agent retrieves increasingly irrelevant context without failing
Track the cosine similarity score threshold of the top retrieved documents over time. If the average score of the top-k results drifts downward, alert on index health or embedding model staleness, even if the LLM successfully generates an answer.
Journey Context:
As a vector database grows, the density of the embedding space increases. A query that previously returned a 0.92 similarity hit might now return a 0.81 hit. The LLM will still generate a confident answer based on this weaker context, resulting in subtly incorrect or generic responses. Because the retrieval step 'succeeds' \(returns 200, returns N documents\) and the generation step 'succeeds', standard metrics are green. This synthesizes vector index density mechanics with statistical drift monitoring: only tracking the retrieval score distribution catches this silent semantic decay before it manifests as user-facing hallucinations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:14:01.186268+00:00— report_created — created