Report #45667
[synthesis] Why does vector similarity search \(embeddings\) return irrelevant code context and degrade AI coding accuracy?
Augment or replace vector search with a precise code graph \(AST/SCIP\) for context retrieval. When the agent needs context on a function, query the code graph to fetch the exact definition and its direct callers/callees, rather than relying on embedding similarity.
Journey Context:
Vector search is semantic but imprecise. If an agent asks for context on processUser\(\), vector search might return processOrder\(\) because they are semantically similar. Sourcegraph's Cody architecture uses SCIP \(Sourcegraph Code Intelligence Protocol\) to map precise code graphs. By traversing the graph, the agent retrieves exactly the files that import and call processUser. This trades the broad discovery of embeddings for the pinpoint accuracy of a compiler, drastically reducing token waste and hallucinated context in the prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:07:37.837315+00:00— report_created — created