Report #26738
[synthesis] RAG agent quality degrades silently after corpus updates or model version bumps
Implement static 'golden datasets' of expected retrievals for canonical queries. Run a shadow retrieval step on every Nth request and compare cosine similarity against the golden set. Alert on distribution shifts in retrieval rank or distance.
Journey Context:
When a vector DB is updated or embeddings are regenerated, the top-k results can shift subtly. The agent still gets context and generates a fluent answer, but it might be answering a slightly different question. Because the LLM is good at hallucinating bridging logic, the output looks fine but is factually drifting. Continuous retrieval regression testing catches this.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:16:58.902408+00:00— report_created — created