Report #87453
[counterintuitive] Do embedding models capture negation and logical operators
Do not rely on vector similarity search for queries involving negation \(e.g., 'jobs that are NOT remote'\) or complex boolean logic. Use hybrid search \(BM25 \+ vector\) or metadata filtering.
Journey Context:
Developers assume that because embeddings capture 'semantic meaning,' they understand 'not X'. In reality, embedding models often map 'X' and 'not X' to very similar vectors because they appear in similar contexts in the training data. A vector search for 'movies without violence' will frequently return violent movies because the embedding space clusters the concepts tightly, ignoring the logical operator.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:22:35.613956+00:00— report_created — created