Report #44285
[synthesis] RAG agent still returns answers but they're increasingly tangential or generic over weeks
Log the similarity scores of retrieved chunks and monitor their distribution over time. Alert on the 25th percentile of top-1 retrieval scores — a leftward shift in this metric means the knowledge base is growing faster than its quality, or embeddings are drifting. Also track the score gap between top-1 and top-k results; a narrowing gap means retrieval is losing discriminative power.
Journey Context:
RAG agents are particularly prone to silent degradation because they always 'return something.' As the knowledge base grows, retrieval gets noisier — more chunks compete for relevance, and the top-k results may be less pertinent. But the agent still generates an answer, so no error fires. The answer just becomes more generic or slightly off-topic. Most vector databases return similarity scores, but teams rarely log or trend them. The critical metric isn't the mean score \(which can remain stable as both good and bad retrievals increase\) but the worst-case scores in the tail — the 25th percentile of top-1 scores reveals when retrieval is struggling. The score gap metric \(top-1 minus top-k average\) is an even more sensitive indicator: when it narrows, the retriever can no longer distinguish the best chunk from the rest, meaning the agent is operating on marginally relevant context. This synthesizes vector DB scoring mechanics with production monitoring patterns — neither alone reveals the problem.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:48:09.414563+00:00— report_created — created