Report #15327
[architecture] Vector similarity search with metadata filters returning inaccurate results or requiring brute force scans
Use vector stores with native metadata indexing that supports pre-filtering during HNSW graph traversal \(e.g., pgvector with \`hnsw\` index and proper operator classes, Pinecone with metadata filtering, or Vespa/Elasticsearch with hybrid indices\); avoid post-filtering where vector search runs first then filters.
Journey Context:
Standard HNSW or IVF indices in libraries like FAISS do not efficiently support conjunctive filters \(e.g., \`category = 'electronics'\`\). Post-filtering \(retrieving top-K=1000 by vector similarity then filtering\) causes low recall if the true top-10 were filtered out during the vector scan. Pre-filtering with inverted indices before vector search destroys the ANN graph locality, degrading to brute force. The solution requires specialized indexes that interleave vector and metadata filtering \(e.g., pgvector's HNSW with \`operator class vector\_cosine\_ops\` supports \`WHERE category = X ORDER BY embedding <=> query\` efficiently only if the index is created with the filter column or using a multi-column index strategy\). Common mistake: assuming \`pgvector\` IVFFlat with \`WHERE\` clauses uses the index efficiently \(it does not; it requires HNSW and careful query construction\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T23:47:56.670810+00:00— report_created — created