Report #21620
[architecture] Vector search failing to retrieve exact IDs, error codes, or proper nouns
Implement hybrid search \(BM25 sparse \+ Dense vector\) with reciprocal rank fusion, rather than relying purely on dense embeddings.
Journey Context:
Dense vector embeddings map semantically similar concepts closely, but they destroy the exact token matching required for identifiers like 'error-4043' or specific variable names. A search for '4043' might return '4044' because they are close in embedding space. The tradeoff is maintaining two indexes and tuning the fusion weights, but it is strictly necessary for coding agents where exact string matches dictate logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:41:54.198394+00:00— report_created — created