Report #10441
[architecture] Selecting the wrong vector storage strategy \(pgvector vs dedicated vector DB\) causing latency spikes or unnecessary infrastructure complexity
Use pgvector with HNSW index for <10M vectors where transactional consistency and hybrid search \(full-text \+ vector\) are needed; only migrate to dedicated vector DB \(Pinecone, Milvus, Qdrant\) when exceeding 100M vectors, requiring sub-10ms p99 latencies under massive concurrency, or needing complex multi-tenancy isolation at scale.
Journey Context:
Teams default to Pinecone because 'vectors need special databases,' introducing eventual consistency, sync lag, and another failure point. pgvector \(Postgres extension\) supports HNSW \(Hierarchical Navigable Small World\) indexes offering excellent recall with ACID transactions. For RAG applications, combining vector similarity with metadata filtering \(WHERE clauses\) is trivial in Postgres but requires complex pre-filtering or post-filtering in dedicated vector DBs. The breaking point is scale: pgvector HNSW indexes are memory-bound \(the entire index must fit in shared\_buffers for performance\). Beyond 10-50M high-dimensional vectors \(1536 dims for OpenAI\), index build times crash and query latency spikes. Also, pgvector runs on your primary Postgres; heavy vector search starves OLTP workload of I/O. Dedicated vector DBs use specialized quantization \(PQ, SQ\) and distributed ANN algorithms \(IVF with nlist partitioning\) to handle billions. If you need 'hybrid search' at massive scale, consider keeping metadata in Postgres but vectors in Qdrant/Pinecone with external ID mapping, or use pgvector with partitioning \(Citus\) for sharding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T10:44:18.706178+00:00— report_created — created