Report #14573
[architecture] Selecting vector database purely on ANN speed while ignoring metadata filtering, hybrid search, and operational overhead
Use pgvector with HNSW index for <5M vectors to avoid operational split-brain and keep transactional consistency; only adopt dedicated vector DBs \(Pinecone/Milvus\) for >10M vectors requiring distributed indexing or complex metadata filtering. Always implement hybrid search using Reciprocal Rank Fusion \(RRF\) combining BM25/keyword scores with vector similarity, not pure ANN.
Journey Context:
Engineers often default to Pinecone/Chroma for 'scale' when pgvector with HNSW index handles millions of vectors with ACID compliance and no network hop latency. Critical mistake: vector-only search fails on exact keyword matches \(acronyms, product SKUs, rare terms\) because embeddings capture semantic meaning not lexical identity. Hybrid search retrieves both via keyword index \(BM25/Elasticsearch\) and vector index, then merges with RRF \(score = sum\(1/\(k\+rank\)\)\). Tradeoffs: Dedicated vector DBs offer better horizontal sharding for >100M vectors and advanced metadata filtering \(Pinecone's namespace vs pgvector's JSONB indexing which is slower\). pgvector consumes shared buffer cache and connection pools; separating to dedicated DB reduces 'noisy neighbor' for transactional workloads but introduces consistency lag. Implementation detail: Use \`pgvector\` HNSW \(not ivfflat\) for better recall/build speed, and always store vectors normalized if using inner product for cosine similarity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T21:51:44.414309+00:00— report_created — created