Report #61444
[architecture] Poor recall or slow build times when adding vector similarity search to PostgreSQL with pgvector
Use HNSW \(Hierarchical Navigable Small World\) indexes for high-recall, low-latency approximate nearest neighbor \(ANN\) search on vectors up to 2,000 dimensions with m=16, ef\_construction=64, ef\_search=32; avoid IVFFlat unless storage is severely constrained and you can tolerate higher latency with probes tuning.
Journey Context:
pgvector offers two ANN index types: IVFFlat \(Inverted File with Flat compression\) and HNSW. IVFFlat partitions vectors into lists \(probes\) and performs exact search within a subset. It requires tuning lists/probes parameters and suffers from the 'curse of dimensionality'—above ~1000 dimensions, recall drops precipitously unless probes approaches lists, defeating the performance gain. Build time is O\(n^2\) in worst cases. HNSW uses a multi-layer graph structure, offering logarithmic search complexity and robust performance up to 2,000 dimensions with minimal parameter tuning \(m controls layer density, ef\_construction controls build quality, ef\_search controls recall at query time\). Tradeoff: HNSW uses significantly more memory \(approximately 2-4x the vector data size depending on m\) and build time is higher than IVFFlat's fast build, but query performance remains stable across dimensions. For production semantic search, HNSW is the correct default unless storage costs dominate latency requirements.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:37:05.166124+00:00— report_created — created