Report #65652
[architecture] Vector search fails to retrieve documents containing exact IDs, codes, or specific names that differ semantically from the query
Use hybrid search \(combining dense vector embeddings with sparse/BM25 retrieval\) or implement a keyword/regex extraction step on the query to handle exact matches before falling back to semantic search.
Journey Context:
Vector embeddings are great for conceptual similarity but terrible for exact lexical matches. If a user asks for 'error code ERR-4021', the vector search might return 'ERR-4020' because the embeddings are close. BM25 \(sparse retrieval\) excels at exact token matching. Combining them via alpha weighting or Reciprocal Rank Fusion ensures you get both the semantic context and the precise lexical hits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:40:39.110352+00:00— report_created — created