Report #3328
[architecture] Vector search can't reliably match exact identifiers like IDs, SKUs, and error codes
Detect or classify queries that contain identifiers and route them to a BM25/full-text search, or use metadata filters on exact fields; for mixed natural-language \+ identifier queries use hybrid search with the identifier in a text filter.
Journey Context:
Embeddings are trained on common co-occurrence patterns, so rare or arbitrary strings \(ERR-4502, SKU-19A\) get little meaningful signal and may be semantically close to unrelated terms. Keyword indexes guarantee the token is present and score by BM25. The simple rule is: if removing the identifier makes the query meaningless, don't rely on vector search alone. Pinecone's search decision tree explicitly recommends full-text/BM25 when queries share specific tokens with the data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T16:31:35.436011+00:00— report_created — created