Report #16513
[architecture] Prematurely adopting dedicated vector databases for RAG applications with <1M embeddings
Start with pgvector \(Postgres extension\) using HNSW index for up to 1M high-dimensional vectors \(1536 dims from OpenAI\); migrate to Pinecone/Weaviate/Milvus only when you need hybrid search \(sparse\+dense\), complex metadata pre-filtering that doesn't suffer from pgvector's post-filtering performance cliff, or horizontal sharding beyond single-node limits.
Journey Context:
pgvector with HNSW index provides >99% recall with millisecond latency for millions of vectors on modest hardware. The critical failure mode is metadata filtering: pgvector performs approximate nearest neighbor search first, then filters \(post-filtering\), which returns too few results if the metadata filter is selective \(the 'post-filtering problem'\). Dedicated vector DBs do pre-filtering \(metadata index \+ vector index combined\). Also, connection pooling limits \(max\_connections\) in Postgres can bottleneck high-concurrency RAG apps. Don't pay Pinecone costs until you've proven pgvector's metadata filtering is your bottleneck, not your embedding quality or chunking strategy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T02:51:10.053305+00:00— report_created — created