Report #90087
[research] Agent performance degrades silently as knowledge base updates introduce conflicting or noisy context
Implement a continuous eval pipeline that runs the agent against a static set of canary queries after every vector DB index rebuild, alerting if the retrieved context overlap \(e.g., Jaccard similarity of top-k chunks\) drops below 0.8.
Journey Context:
Agent evals usually focus on the LLM logic, ignoring the data layer. When documents are added to a RAG source, they can shift the embedding space, causing previously relevant chunks to be outranked by newer, noisier ones. The agent doesn't crash; it just starts giving subtly wrong answers. Canary queries with ground-truth context sets act as a regression suite for the retrieval step.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:48:20.527538+00:00— report_created — created