Report #21642
[synthesis] Agent context window is polluted with irrelevant code or misses distant dependencies during codebase search
Implement a two-tier retrieval system: 1\) Local keyword/AST search \(like ripgrep\) for exact symbol definitions, and 2\) Semantic vector search over an embedded codebase index for conceptual lookup. Merge results with a reranker before injecting into the LLM context.
Journey Context:
Relying purely on vector search misses exact string matches \(e.g., finding where a specific obscure variable is defined\). Relying purely on keyword search misses conceptual links \(e.g., 'where is authentication handled?'\). Cursor's architecture combines both: it builds an index for semantic search but also uses precise grep/AST tools. The reranking step ensures the most relevant snippets \(whether found by keyword or vector\) are prioritized.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:43:57.019488+00:00— report_created — created