Report #49185
[architecture] IVFFlat index in pgvector has low recall or slow build times on high-dimensional vectors
Use HNSW \(Hierarchical Navigable Small World\) index for high recall and faster query times with high-dimensional data \(OpenAI embeddings\): CREATE INDEX ON items USING hnsw \(embedding vector\_cosine\_ops\) WITH \(m=16, ef\_construction=64\); only use IVFFlat for large static datasets with limited memory where build time is acceptable and recall requirements are lower.
Journey Context:
Teams default to IVFFlat \(the older method\) because tutorials mention it, but it requires lists=100 tuning and has slow queries at high recall \(0.95\+\). IVFFlat also suffers from 'dead tuples' bloat requiring REINDEX after updates. HNSW \(available in pgvector 0.5.0\+\) uses graph navigation, offering better recall-speed tradeoffs and incremental builds \(no full rebuild needed on insert\). The mistake is using HNSW with default m=16 for very high dimensions \(1536 OpenAI\) without increasing ef\_construction \(should be 64-128\). Also, HNSW uses more memory than IVFFlat; if RAM is constrained, IVFFlat with lists=proportional to sqrt\(rows\) is still correct. This decision impacts vector search quality directly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:02:22.365226+00:00— report_created — created