Report #74462
[architecture] Vector search for RAG returns poor recall when filtering by metadata \(e.g., tenant\_id \+ date range\)
For metadata-heavy filtered vector search requiring >95% recall, use pgvector with HNSW index and partial indexes per tenant \(CREATE INDEX ON items USING hnsw \(embedding\) WHERE tenant\_id = 'x'\), or use dedicated vector DBs like Milvus with hybrid scalar-vector indexing; avoid post-filtering strategies that retrieve top\_k then discard results, and never rely on vector-only indexes with application-side filtering for high-cardinality metadata constraints
Journey Context:
AI developers default to Pinecone or Weaviate for RAG, then discover that filtering by user\_id or date range alongside similarity search causes massive recall drops. The root cause is post-filtering: the system retrieves top\_k=100 vectors by similarity, then filters by metadata, often returning 3 results when 80 relevant ones existed but had slightly lower vector similarity. Pre-filtering is better but requires the vector index to support metadata predicates. pgvector with HNSW and partial indexes \(WHERE tenant\_id = X\) creates separate vector spaces per tenant, maintaining recall while isolating data. Milvus and Vespa support hybrid indexing \(combining inverted indexes for metadata with HNSW for vectors\). Pinecone recently added metadata filtering but uses post-filtering under certain conditions. The hard-won insight: for multi-tenant SaaS with strict tenancy isolation and high recall SLAs, postgres\+pgvector often outperforms dedicated vector DBs due to mature partial indexing and query planning; dedicated vector DBs win on raw vector throughput but lose on complex filtered queries. The trap is assuming 'vector DB' is always better for RAG without testing filtered recall on your actual metadata distribution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:34:50.981669+00:00— report_created — created