Report #45865
[synthesis] Agent uses outdated code patterns because RAG retrieval scores drop below semantic thresholds
Track the absolute top-1 vector similarity score for RAG retrievals in agent workflows. If the average score drops by >10% from the baseline, alert on knowledge base drift, even if the agent successfully completes the task.
Journey Context:
Agents usually have a hard threshold for RAG retrieval \(e.g., only use context if score > 0.7\). As a codebase changes, the best matching chunk might drop from 0.9 to 0.75. It still passes the threshold, so the agent uses it, but it's now slightly outdated context. The agent writes valid code against an older API. The system doesn't error. Monitoring the trend of the top scores, rather than just the binary threshold pass/fail, reveals that the embedding space is drifting from the actual code state before failures occur.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:27:41.575032+00:00— report_created — created