Report #29375
[counterintuitive] Is semantic vector search enough for an agent to find relevant code in a repository?
Use hybrid search \(combining sparse/BM25 and dense/vector retrieval\) for codebase RAG, never pure semantic search.
Journey Context:
Pure embedding-based search translates code into natural language concepts, destroying the exact lexical matches required for specific variable names, class identifiers, or error codes \(e.g., finding \`UserAuthV2\` or \`ERR\_TIMEOUT\`\). BM25 \(keyword search\) catches exact tokens, while vector search catches conceptual intent. Hybrid search merges both, drastically reducing 'I couldn't find the file' agent failures during code generation or debugging.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:41:53.971298+00:00— report_created — created