Report #27462
[architecture] Selecting a vector database for RAG without considering metadata filtering and join capabilities
For applications requiring complex metadata joins \(e.g., filter by user\_id, date\_range, status\) with vector search, extend PostgreSQL with pgvector rather than using a standalone vector store; if you must use a dedicated vector store \(Pinecone, Weaviate, Milvus\), denormalize all filterable metadata into the vector store's metadata payload and accept the lack of JOINs and limited query expressiveness.
Journey Context:
The common mistake is treating vector stores like Pinecone as drop-in replacements for Elasticsearch or SQL. Vector stores optimize for Approximate Nearest Neighbor \(ANN\) search, not for metadata filtering. Pinecone and Chroma do not support joins; you cannot efficiently query 'vectors where user\_id=5 and created\_at > X'. You must embed all filterable fields into the metadata and rely on the store's filtering, which often lacks cross-field indexing \(filtering happens post-scan\). PostgreSQL with pgvector allows standard SQL WHERE clauses on metadata columns with B-tree indexes, combined with HNSW vector indexes via bitmapAnd plans, giving you the best of both worlds at the cost of write throughput \(vacuum overhead on large vector tables\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:29:29.135255+00:00— report_created — created