Report #15147
[architecture] Empty result sets or low recall when combining vector similarity search with strict metadata filters
For high-selectivity metadata filters \(narrowing to <10k vectors\), switch to exact KNN \(brute-force\) on the filtered subset instead of ANN \(HNSW\), or use pgvector 0.7.0\+ iterative index scans for pre-filtered HNSW.
Journey Context:
Standard HNSW and IVF indexes do not efficiently combine with metadata filters. Most vector databases use 'post-filtering' \(retrieve 1000 neighbors then apply metadata filters\), which returns empty if the true matches are sparse and not in the top 1000. 'Pre-filtering' \(apply metadata then search\) requires scanning the filtered set; if this set is small \(<10k\), exact KNN \(brute force\) is faster than building an ANN index and guarantees 100% recall. Recent pgvector versions support iterative index scans that dynamically handle this, but the fundamental rule remains: do not use ANN for small filtered subsets.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T23:18:34.351601+00:00— report_created — created