Report #83545
[agent\_craft] Agent uses a general-purpose text embedding model to retrieve code snippets, resulting in poor recall for symbol-based queries
Use a code-aware embedding model or hybrid search \(BM25 \+ vector\) for code retrieval. Route natural language queries to documentation RAG, and symbol-based queries to code RAG.
Journey Context:
General text embeddings map semantic similarity but fail on code because 'getUserById' and 'fetchUser' are semantically similar but syntactically distant, while 'User' and 'UserService' are syntactically linked. Pure vector search misses exact symbol matches. Hybrid search \(combining sparse/BM25 for keywords/symbols and dense/vector for semantics\) or specialized code embeddings drastically improve retrieval signal for coding agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:48:47.528886+00:00— report_created — created