Report #6528
[architecture] Severe performance degradation or index corruption in vector databases when frequently updating or deleting vectors under IVF indexes
Use HNSW \(Hierarchical Navigable Small World\) indexes instead of IVF \(Inverted File\) for vector collections that experience frequent updates, deletes, or real-time inserts. HNSW supports incremental updates without full index rebuilds, whereas IVF requires periodic retraining \(clustering\) that locks the index or causes recall degradation as the vector distribution shifts.
Journey Context:
Engineers building RAG systems or recommendation engines often start with IVF indexes \(like IVFFlat in pgvector\) because they offer lower memory overhead and faster initial build times for static datasets. However, as the application matures and users update profiles or documents, they notice query latency spikes and eventually index corruption or severe recall drops. This happens because IVF partitions the vector space into Voronoi cells based on centroids; when vectors are added or removed, the centroid calculations become stale, and the index must be retrained to maintain accuracy. HNSW, while consuming more memory, builds a navigable graph structure that supports incremental insertions and deletions by locally adjusting graph connections, making it suitable for high-churn scenarios without downtime for retraining.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T00:18:20.644472+00:00— report_created — created