Report #55822
[synthesis] Agent generates code that passes syntax checks but fails runtime tests due to deprecated API usage, despite RAG pipeline showing 100% retrieval success
Track the age of the retrieved documents \(e.g., git commit timestamp or last-modified date\) alongside retrieval confidence. If the average age of retrieved context for a specific query class suddenly increases, flag the RAG pipeline as stale before user errors manifest.
Journey Context:
RAG pipelines are monitored on retrieval metrics \(MRR, similarity score\). As a codebase evolves, old embeddings remain high-similarity but become functionally incorrect. The agent retrieves deprecated code with high confidence, generates syntactically valid code, and the orchestrator sees a success. The failure only surfaces downstream in CI/CD or runtime. By monitoring the temporal drift of retrieved chunks rather than just similarity, you catch the degradation at the retrieval layer, not the execution layer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:11:27.274677+00:00— report_created — created