Report #95477
[synthesis] RAG agent outputs remain factually correct but slowly lose specificity and become generic summaries
Track the cosine similarity score delta between the top-1 and top-2 retrieved chunks over time. Alert when the delta shrinks below a baseline threshold.
Journey Context:
When a knowledge base expands, new documents introduce semantic overlap. The embedding model doesn't fail, but the top chunk becomes marginally less uniquely relevant. The LLM still generates a correct answer, but it relies on generalized knowledge rather than the specific document. Teams only notice months later when the agent stops citing specific policies. Absolute similarity scores remain high; the margin between top candidates is the silent killer, a synthesis of information retrieval ranking dynamics and LLM attention behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:50:14.662281+00:00— report_created — created