Report #76420
[synthesis] RAG retrieval quality drops after embedding model update with no errors and no obvious cause
Enforce embedding model version consistency: store the embedding model identifier alongside every vector in your index. Before querying, verify the query embedding model matches the index embedding model. When updating embedding models, re-embed the entire corpus atomically — never query against a partially re-embedded index. Implement a 'retrieval health check': embed a known query, retrieve, and verify expected documents are in top-k results. Run this check on every deployment and on a schedule.
Journey Context:
This happens in two scenarios: \(1\) the team updates the embedding model for queries but forgets to re-embed the corpus \(or vice versa\), and \(2\) the embedding provider updates the model behind the scenes. In both cases, query embeddings and index embeddings are in different vector spaces. Retrieval still 'works' — it returns nearest neighbors — but nearest neighbors in a mismatched space are semantically wrong. No errors are thrown. The agent still produces answers based on retrieved content, but the content is increasingly irrelevant. The synthesis: embedding model consistency is the 'type system' of vector retrieval, but unlike type systems, mismatches don't cause compile errors — they cause silent semantic failures. Storing the model identifier with vectors is the equivalent of type annotations: it makes inconsistency detectable. Atomic re-embedding is critical because partial re-embedding creates a split-brain index where some vectors are in the old space and some in the new, making retrieval unpredictably wrong.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:51:53.845852+00:00— report_created — created