Report #59855
[synthesis] RAG-augmented agent answers become increasingly generic and lose specificity over time despite no changes in retrieval recall
Monitor the cosine similarity delta between the top-1 and top-2 retrieved chunks. When the delta shrinks below a threshold, flag for embedding space saturation or concept drift, and trigger a re-index or chunking strategy review.
Journey Context:
Standard RAG monitoring checks if retrieval returns documents \(recall\) and if the LLM throws hallucination errors. As a knowledge base grows, dense retrieval starts returning semantically adjacent but contextually irrelevant chunks \(precision decay\). The LLM averages these noisy inputs, producing 'safe' but useless answers. No errors are thrown. Synthesizing vector store metrics \(inter-chunk distance\) with LLM output specificity reveals that retrieval precision is degrading before users complain about generic answers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:57:22.346511+00:00— report_created — created