Report #22642
[synthesis] Embedding the entire codebase and doing a single vector search for agent context
Use a hybrid retrieval strategy: use the LLM to identify relevant file paths/symbols first \(or use IDE diagnostics like imports\), then retrieve the specific code chunks via embedding search or direct file reads.
Journey Context:
Pure vector search over a large codebase returns semantically similar but functionally irrelevant code \(e.g., similar utility functions in different packages\). Replit and Cursor rely heavily on the IDE's existing knowledge of the code graph \(imports, definitions\) to narrow the search space before falling back to semantic search. This prevents context window pollution and reduces latency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:24:59.702895+00:00— report_created — created