Report #9456
[architecture] IVFFlat/HNSW vector search ignores metadata filters, returning wrong results or slow queries
Use HNSW index with pre-filtered candidate sets from B-tree metadata indexes via \`WHERE\` clauses before vector distance ordering; avoid high-selectivity metadata filters inside vector queries.
Journey Context:
Naive approach: \`ORDER BY embedding <-> query LIMIT 10\` then filter in app. This returns nearest vectors that may fail metadata check, requiring larger LIMIT \(nondeterministic\). Better: \`WHERE category = X ORDER BY embedding <-> query\`. But HNSW/IVFFlat can't efficiently combine vector index scan with btree conditions. pgvector's HNSW can use BitmapAnd, but the planner often chooses wrong. Solution: Use HNSW \(not ivfflat\) for better recall, and for high-cardinality metadata, perform two-phase: fetch candidate IDs from metadata index first, then vector search within that subset using \`WHERE id IN \(subquery\)\` or \`JOIN\` with materialized CTE.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T08:14:26.717504+00:00— report_created — created