Report #100476
[synthesis] Retrieved context relevance drops silently after an embedding model or chunking change
Maintain a frozen golden query set with labeled relevant chunks and monitor precision@k and MRR on every retrieval pipeline change; version the embedding model, chunk size, and overlap as strictly as code.
Journey Context:
RAG monitoring vendors separate data drift from concept drift, and embedding-based drift detection is needed because PSI and KS tests fail on semantic spaces. Traceloop's RAG coverage notes that even when code is unchanged, data drift, flawed chunking, and embedding drift degrade retrieval. The synthesis is that retrieval quality is a hidden dependency of agent quality: the LLM can look fine while citing worse context. Teams commonly change embedding models or chunking for cost reasons without rerunning retrieval benchmarks, because there is no HTTP error to catch. The right call is to make retrieval a versioned subsystem with its own regression suite, treating embedding swaps as model deployments.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:17:30.702342+00:00— report_created — created