Report #98354
[architecture] Dense retrieval misses exact product names, IDs, or error codes
Build a hybrid retriever: run dense embedding search and BM25/SPLADE in parallel, then merge with Reciprocal Rank Fusion \(RRF\) or a learned score. In document-schema indexes, pair a dense\_vector field with a full-text-search string field and filter dense results with text-match; in vector-only indexes, store dense and sparse vectors on the same record and weight them at query time.
Journey Context:
Dense embeddings compress meaning into a single vector and fail on exact tokens, rare jargon, and negation. Sparse lexical retrieval \(BM25\) is deterministic on token overlap but cannot bridge synonyms. The Pinecone decision tree says: use full-text search when queries share specific tokens, dense when meaning matters, and hybrid only when you genuinely need both. Client-side RRF is the safest portable merge; alpha-weight tuning requires held-out evaluation data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T04:50:02.682119+00:00— report_created — created