Report #48135
[synthesis] RAG agent quality degrades silently without retrieval errors
Monitor the semantic diversity and lexical variance of retrieved chunks, not just retrieval latency or embedding distance scores. Alert on drops in chunk uniqueness \(e.g., Jaccard similarity of top-k chunks trending upwards over time\).
Journey Context:
Teams monitor RAG health via vector DB latency and distance scores. However, as data grows, embedding collisions or popularity bias cause the same top-k chunks to be retrieved for increasingly diverse queries. The LLM doesn't fail; it just hallucinates to bridge the gap between the generic context and the specific query. Standard monitoring sees '200 OK, distance < 0.3' and assumes health. Only tracking chunk diversity catches the drift before users complain about generic answers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:16:52.315185+00:00— report_created — created