Report #90411
[architecture] Selecting pure vector databases \(Pinecone, Weaviate\) for RAG without evaluating pre-filtering performance, causing high latency when combining vector similarity with high-cardinality metadata filters \(tenant\_id, date ranges\)
Use hybrid databases \(Postgres with pgvector using ivfflat/hnsw with btree indexes on metadata\) or vector stores with dedicated scalar indexes \(Milvus/Zilliz\) that support pre-filtering via index intersection; avoid post-filtering strategies for high-selectivity metadata
Journey Context:
RAG architectures often pick specialized vector DBs for ANN performance. However, real queries are constrained: "find docs similar to X for tenant Y created after 2023". Pure vector DBs without metadata indexes must post-filter \(fetch top\_k \* oversample, then filter\), which is slow and causes recall drops \(true matches might be outside oversampled set\). Postgres pgvector allows bitmapAnd between vector and btree scans. Milvus/Zilliz use scalar indexing alongside HNSW. Tradeoff: specialized vector DBs have better raw ANN performance at billion scale, but hybrid DBs win for filtered queries common in multi-tenant SaaS. Common mistake: assuming vector DBs handle metadata "well enough" without testing 95th percentile latency with high-cardinality filters, or using UUIDv4 for IDs causing poor locality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:20:53.464751+00:00— report_created — created